Sadri Hassani 


Mathematical Physics 

A Modem Introduction to 
Its Foundations 

With 152 Figures 



Springer 


ODTtJ Kt)T«PaANES! ^ 
m. e. t. u. Horary 




Sadri Hassani 
Department of Physics 
Illinois State University 
Normal, IL 61790 
USA 

hassani@entropy.phy. ilstu.edu 


QC20 H394 METULIBRARY. 

c.2 

I^thematlcia physics: a modern To my wife Sarah 

ll 咖 _ __ll D z d T: n m 

0020227S69 Dane Arash and Daisy Bita 


336417 

Library of Congress Cataloging-in-Publication Data 
Hassani, Sadri. 

Mathematical physics : a modem introduction its foundations / 

Sadri Hassani. 
p. cm. 

Includes bibliographical references and index. 

ISBN 0-387^98579-4 (alk. paper) 

1. Mathematical physics. I. Title. 

QC20.H394 1998 

530.15—dc21 98-24738 

Printed on acid-free paper. 

◎ 1999 Springer-Verlag New York, Inc. , 

All rights reserved. This work may not be translated or copied in whole or in part without the written 
permission of the publisher (Springer-Verlag New York, Inc” 175 Fifth Avenue, New York, NY 
10010, USA), except for brief exceipts in connection with reviews or scholarly analysis. Use in 
connection with any fonn of information storage and retrieval, electronic adaptation, computer soift- 
ware, or by similar or dissimilar methodology now known or hereafter developed is forbidden. 

The use of general descriptive names, trade names, trademarks, etc” in this publication, even if the 
former are not especially identified, is not to be taken as a sign that such names, as understood by the 
Trade Marks and Merchandise Marks Act，may accordingly be used freely by anyone. 

Production managed by Karina Mikhli; manufacturing supervised by Thomas King. 

Photocomposed copy prepared from the author's TeX files. 

Printed and bound by Hamilton Printing Co., Rensselaer ， NY. ^ 

Printed in the United States of America. 

9 8 7 6 5 4 3 (Corrected third printing, 2002) 

ISBN 0-387-98579-4 SPIN 10854281 

/ 

Springer-Verlag New York Berlin Heidelberg 
A member, of BertelsmannSpringer Science^Business Media GmbH 


Qc 20 

H294 

02 . 



Preface 



“Ich kann es nun einmal nicht lassen» in diesem Drama von 
Mathemadk und Physik~die sich im Dunkeln befcuchten, 
aber von Angesicht zu Angesicht so geme einander verkennen 
und verleugnen~die Rolle des (wie ich geniigsam erfuhr, oft 
unerwunschten) Boten zu spielen.” 1 


Hermann Weyl 


It is said that mathematics is the language of Nature. If so, then physics is its 
poetry. Nature started to whisper into our ears when Egyptians and Babylonians 
were compelled to invent and use mathematics in their day-to-day activities. The 
faint geometric and arithmetical pidgin of over four thousand years ago, suitable 
for rudimentary conversations with nature as applied to simple landscaping, has 
turned into a sophisticated language in which the heart of matter is articulated. 

The interplay between mathematics and physics needs no emphasis. What 
may need to be emphasized is that mathematics is not merely a tool with which the 
presentation of physics is facilitated, but the only medium in which physics can 
survive. Just as language is the means by which humans can express their thoughts 
and without which they lose their unique identity, mathematics is the only language 
through which physics can express itself and without which it loses its identity. 
And just as language is perfected due to its constant usage, mathematics develops 
in the most dramatic way because of its usage in physics. The quotation by Weyl 
above, an approximation to whose translation is “In this drama of mathematics 
and physics—which fertilize each other in the dark, but which prefer to deny and 
misconstrue each other face to face—I cannot, however, resist playing the role 
of a messenger albeit, as I have abundantly learned, often an unwelcome one," 



vi PREFACE 


is a perfect description of the natural intimacy between what mathematicians and 
physicists do, and the unnatural estrangement between the two camps. Some of the 
most beautiful mathematics has been motivated by physics (differential equations 
by Newtonian mechanics, differential geometry by general relativity, and operator 
theory by quantum mechanics), and some of the most fundamental physics has been 
expressed in the most beautiful poetry of mathematics (mechanics in symplectic 
geometry, and fundamental forces in Lie group theory). 

I do not want to give the impression that mathematics and physics cannot 
develop independently. On the contrary, it is precisely the independence of each 
discipline that reinforces not only itself, but the other discipline as well — just as the 
study of the grammar of a language improves its usage and vice versa. However, 
the most effective means by which the two camps can accomplish great success 
is through an intense dialogue. Fortunately, with the advent of gauge and string 
theories of particle physics, such a dialogue has been reestablished between physics 
and mathematics £rfter a relatively long lull. 


Level and Philosophy of Presentation 

This is a book for physics students interested in the mathematics they use. It 
is also a book for mathematics students who wish to see some of the abstract 
ideas with which they are familiar come alive in an applied setting. The level of 
presentation is that of an advanced undergraduate or beginning graduate course (or 
sequence of courses) traditionally called “Mathematical Methods of Physics” or 
some variation thereof. Unlike most existing mathematical physics books intended 
for the same audience, which are usually lexicographic collections of facts about 
the diagonalization of matrices, tensor analysis, Legendre polynomials, contour 
integration, etc” with little emphasis on formal and systematic development of 
topics, this book attempts to strike a balance between formalism and application, 
between the abstract and the concrete. 

I have tried to include as much of the essential formalism as is necessaiy to 
render the book optimally coherent and self-contained. This entails stating and 
proving a large number of theorems, propositions, lemmas, and corollaries. The 
benefit of such an approach is that the student will recognize clearly both the power 
and the limitation of a mathematical idea used in physics. There is a tendency on the 
part of the novice to universalize the mathematical methods and ideas encountered 
in physics courses because the limitations of these methods and ideas are not 
clearly pointed out. 

There is a great deal of freedom in the topics and the level of presentation that 
instructors can choose from this book. My experience has shown that Parts I, n, 
IH, Chapter 12, selected sections of Chapter 13, and selected sections or examples 
of Chapter 19 (or a large subset of all this) will be a reasonable course content for 
advanced undergraduates. If one adds Chapters 14 and 20, as well as selected topics 
from Chapters 21 and 22, one can design a course suitable for first-year graduate 


PREFACE vii 


students. By judicious choice of topics from Parts VH and VIH, the instructor 
can bring the content of the course to a more modem setting. Depending on the 
sophistication of the students, this can be done either in the first year or the second 
year of graduate school. 


Features 

To better understand theorems, propositions, and so forth, students need to see 
them in action. There are over 350 worked-out examples and over 850 problems 
(many with detailed hints) in this book, providing a vast arena in which students 
can watch the formalism unfold. The philosophy underlying this abundance can 
be summarized as “An example is worth a thousand words of explanation •，’ Thus, 
whenever a statement is intrinsically vague or hard to grasp, worked-out examples 
and/or problems with hints are provided to clarify it. The inclusion of such a 
large number of examples is the means by which the balance between formalism 
and application has been achieved. However, although applications are essential 
in understanding mathematical physics, they are only one side of the coin. The 
theorems, propositions, lemmas, and corollaries, being highly condensed versions 
of knowledge, are equally important. 

A conspicuous feature of the book, which is not emphasized in other compa¬ 
rable books，is the attempt to exhibit — as much as it is useful and applicable- — 
interrelationships among various topics covered. Thus, the underlying theme of a 
vector space (which, in my opinion, is the most primitive concept at this level of 
presentation) recurs throughout the book and alerts the reader to the connection 
between various seemingly unrelated topics. 

Another useful feature is the presentation of the historical setting in which 
men and women of mathematics and physics worked. I have gone against the 
trend of the “ahistoricism” of mathematicians and physicists by summarizing the 
life stories of the people behind the ideas. Many a time, the anecdotes and the 
historical circumstances in which a mathematical or physical idea takes form can 
go a long way toward helping us understand and appreciate the idea, especially if 
the interaction among — and the contributions of — all those having a share in the 
creation of the idea is pointed out, and the historical continuity of the development 
of the idea is emphasized. 

To facilitate reference to them, all mathematical statements (definitions, theo¬ 
rems, propositions, lemmas, corollaries, and examples) have been numbered con¬ 
secutively within each section and are preceded by the section number. For exam¬ 
ple, 4.2.9 Definition indicates the ninth mathematical statement (which happens 
to be a definition) in Section 4 . 2 . The end of a proof is marked by an empty square 
□，and that of an example by a filled square H, placed at the right margin of each. 

Finally, a comprehensive index, a large number of marginal notes, and many 
explanatory underbraced and overbraced comments in equations facilitate the use 



and comprehension of the book. In this respect, the book is also useful as a refer¬ 
ence. 


Organization and Topical Coverage 

Aside from Chapter 0, which is a collection of purely mathematical concepts, 
the book is divided into eight parts. Parti, consisting of the first four chapters, is 
devoted to a thorough study of finite-dimensional vector spaces and linear operators 
defined on them. As the unifying theme of the book, vector spaces demand careful 
analysis, and Parti provides this in the more accessible setting of finite dimension in 
a language that is conveniently generalized to the more relevant infinite dimensions, 
the subject of the next part. 

Following a brief discussion of the technical difficulties associated with in- 
jfinity. Part II is devoted to the two main infinite-dimensional vector spaces of 
mathematical physics: the classical orthogonal polynomials, and Fourier series 
and transform. 

Complex variables appear in Part HI. Chapter 9 deals with basic properties of 
complex functions, complex series, and their convergence. Chapter 10 discusses 
the calculus of residues and its application to the evaluation of definite integrals. 
Chapter 11 deals with more advanced topics such as multivalued functions, analytic 
continuation, and the method of steepest descent. 

Part IV treats mainly ordinary differential equations. Chapter 12 shows how 
ordinary differential equations of second order arise in physical problems, and 
Chapter 13 consists of a formal discussion of these differential equations as well 
as methods of solving them numerically. Chapter 14 brings in the power of com¬ 
plex analysis to a treatment of the hypergeometric differential equation. The last 
chapter of this part deals with the solution of differential equations using integral 
transforms. 

Part V starts with a formal chapter on the theory of operator and their spectral 
decomposition in Chapter 16. Chapter 17 focuses on a specific type of operator, 
namely the integral operators and their corresponding integral equations. The for¬ 
malism and applications of Sturm-Liouville theory appear in Chapters 18 and 19, 
respectively. 

The entire Part VI is devoted to a discussion of Green’s functions. Chapter 

20 introduces these functions for ordinary differential equations, while Chapters 

21 and 22 discuss the Green’s ftmetions in an m-dimensional Euclidean space. 
Some of the derivations in these last two chapters are new and，as far as I know, 
unavailable anywhere else. 

Parts VH and Vin contain a thorough discussion of Lie groups and their ap¬ 
plications. The concept of group is introduced in Chapter 23. The theory of group 
representation, with an eye on its application in quantum mechanics, is discussed 
in the next chapter. Chapters 25 and 26 concentrate on tensor algebra and ten¬ 
sor analysis on manifolds. In Part VUI，the concepts of group and manifold are 



PREFACE ix 


brought together in the context of Lie groups* Chapter 27 discusses Lie groups 
and their algebras as well as their representations, with special emphasis on their 
application in physics. Chapter 28 is on differential geometry including a brief 
introduction to general relativity. Lie’s original motivation for constructing the 
groups that bear his name is discussed in Chapter 29 in the context of a systematic 
treatment of differential equations using their symmetry groups. The book ends in 
a chapter that blends many of the ideas developed throughout the previous parts 
in order to treat variational problems and their symmetries. It also provides a most 
fitting example of the claim made at the beginning of this preface and one of the 
most beautiful results of mathematical physics: Noether’s theorem on the relation 
between symmetries and conservation laws. 


Acknowledgments 

It gives me great pleasure to thank all those who contributed to the making of 
this book. George Rutherford was kind enough to volunteer for the difficult task 
of condensing hundreds of pages of biography into tens of extremely informative 
pages. Without his help this unique and valuable feature of the book would have 
been next to impossible to achieve. I thank him wholeheartedly. Rainer Grobe and 
Qichang Su helped me with my rusty computational skills. (R. G. also helped me 
with my rusty German!) Many colleagues outside my department gave valuable 
comments and stimulating words of encouragement on the earlier version of the 
book. I would like to record my appreciation to Neil Rasband for reading part of the 
manuscript and commenting on it. Special thanks go to Tom von Foerster, senior 
editor of physics and mathematics at Springer-Verlag, not only for his patience and 
support, but also for the, extreme care he took in reading the entire manuscript and 
giving me invaluable advice as a result. Needless to say, the ultimate responsibility 
for the content of the book rests on me. Last but not least, I thank my wife, Sarah, 
my son, Dane, and my daughter, Daisy, for the time taken away from them while 
I was writing the book, and for their support during the long and arduous writing 
process. 

Many excellent textbooks, too numerous to cite individually here, have influ¬ 
enced the writing of this book. The following, however, are noteworthy for both 
their excellence and the amount of their influence: 

Birkhoff, G., and G.-C. Rota, Ordinary Differential Equations, 3rd ed.. New York, 
Wiley, 1978. 

Bishop, R., and S. Goldberg, Tensor Analysis on Manifolds, New York, Dover, 
1980. 

Dennery, R, and A. Krzywicki, Mathematics for Physicists, New York, Harper & 
Row, 1967. 

Halmos, P., Finite-Dimensional Vector Spaces ， 2nd ed, ， Princeton, Van Nostrand, 
1958. 



Hamcrmcsh, M. Group Theory and its Application to Physical Problems ， Dover, 
New York ， 1989. 

Olver, P. Application of Lie Groups to Differential Equations, New York, Spiinger- 
Verlag, 1986. 、 

Unless otherwise indicated, aJl biographical sketches have been taken from the 
following three sources: 

Gillispie ， C” Dictionary of Scientific Biography, Charles Scribner’s，New York, 

1970. 

Simmons, G. Calculus Gems ，New York, McGraw-Hill, 1992. 

History of Mathematics archive at www-groups. dcs. st-and. ac. uk:80. 

I would greatly appreciate any comments and suggestions for improvements. 
Although extreme care was taken to correct all the misprints, the mere volume of 
the book makes it very likely that I have missed some (perhaps many) of them. I 
shall be most grateful to those readers kind enough to bring to my attention any 
remaining mistakes, typographical or otherwise. Please feel free to contact me. 

Sadri Hassani 

Campus Box 4560 

Department of Physics 

Illinois State University 

Normal, IL 61790-4560, USA 

e-mail: hassani@entropy.phy.ilstu.edu 

It is my pleasure to thank all those readers who pointed out typographical mistakes 
and suggested a few clarifying changes. With the exception of a couple that required 
substantial revision, I have incorporated all the corrections and suggestions in this 
second printing. 




Note to the Reader 


Mathematics and physics are like the game of chess (or, for that matter, like any 
game) — you will learn only by “playing” them. No amount of reading about the 
game will make you a master. In this book you will find a large number of examples 
and problems. Go through as many examples as possible, and try to reproduce them. 
Pay particular attention to sentences like “The reader may check ... ” or “It is 
straightforward to show ...” These are red flags warning you that for a good 
understanding of the material at hand, you need to provide the missing steps. The 
problems often fill in missing steps as well; and in this respect they are essential 
for a thorough understanding of the book. Do not get discouraged if you cannot get 
to the solution of a problem at your first attempt. If you start from the beginning 
and think about each problem hard enough, you will get to the solution, and you 
will see that the subsequent problems will not be as difficult.. 

The extensive index makes the specific topics about which you may be in¬ 
terested to learn easily accessible. Often the marginal notes will help you easily 
locate the index entry you are after. 

I have included a large collection of biographical sketches of mathematical 
physicists of the past. These are truly inspiring stories, and I encourage you to read 
them. They let you see that even under excruciating circumstances, the human mind 
can work miracles. You will discover how these remarkable individuals overcame 
the political, social, and economic conditions of their time to let us get a faint 
glimpse of the truth. They are our true heroes. 



Contents 


Preface v 

• •_ ~ .j ■: ■ 

Note to the Reader xi 

List of Symbols ™ 

0 Mathematical Preliminaries 1 

0.1 Sets. 1 

0.2 Maps. . . . . 4 

0.3 Metric Spaces. 7 

0.4 Cardinality . .. 10 

0.5 Mathematical Induction. 12 

0.6 Problems. 14 

I Finite-Dimensional Vector Spaces 17 

j < 

1 Vectors and Transformations 19 

1.1 Vector Spaces. 19 

1.2 Inner Product . 23 

1.3 Linear Transformations. 32 

1.4 Algebras • • .. 41 

1.5 Problems. 44 

2 Operator Algebra 49 

2.1 Algebra of 4 ： (V). 49 














xiv CONTENTS 


2.2 Derivatives of Functions of Operators. 

2.3 Conjugation of Operators. 

2.4 Hermitian and Unitary Operators. 

2.5 Projection Operators. 

2.6 Operators in Numerical Analysis ....... 

2.7 Problems. 

3 Matrices: Operator Representations 

3.1 Matrices. 

3.2 Operations on Matrices. 

3.3 Orthonormal Bases . 

3.4 Change of Basis and Similarity Transformation 

3.5 The Determinant. 

3.6 The Trace. 

3.7 Problems. 

4 Spectral Decomposition 

4.1 Direct Sums ..... 

4.2 Invariant Subspaces.. 

4.3 Eigenvalues and Eigenvectors. 

4.4 Spectral Decomposition.... 

4.5 Functions of Operators .. 

4.6 Polar Decomposition .. 

4.7 Real Vector Spaces . 

4.8 Problems. 

II Infinite-Dimensional Vector Spaces 

5 Hilbert Spaces 

5.1 The Question of Convergence. 

5.2 The Space of Square-Integrable Functions .. 

5.3 Problems. 

6 Generalized Functions 

6.1 Continuous Index. 

6.2 Generalized Functions. 

6.3 Problems. 

7 Classical Orthogonal Polynomials 

7.1 General Properties.. 

7.2 Classification. 

7.3 Recurrence Relations. 

7.4 Examples of Classical Orthogonal Polynomials 


6 1 U ^0 7 o 6 22791313 992475908 3 5 5 0 7 9 9 5 9 2 2 5 6 9 
5 6 6 6 7 7 88889900 001112233 4 4 4 5 5 5 5 6 6 7 7 7 7 7 

1 1 1 11111111 1 1 1 1 IX 1 1 1 1 1 1 1 1 



































CONTENTS xv 


7.5 Expansion in Terms of Orthogonal Polynomials. 186 

7.6 Generating Functions .. 190 

7.7 Problems. 190 

8 Fourier Analysis 196 

8.1 Fourier Series.. 196 

8.2 The Fourier Transform. 208 

8.3 Problems. 220 


in Complex Analysis 225 

9 Complex Calculus 227 

9.1 Complex Functions *. 227 

9.2 Analytic Functions. 228 

9.3 Conformal Maps. 236 

9.4 Integration of Complex Functions .. 241 

9.5 Derivatives as Integrals. 248 

9.6 Taylor and Laurent Series. 252 

9.7 Problems. 263 

10 Calculus of Residues 270 

10.1 Residues. 270 

10.2 Classification of Isolated Singularities . 273 

10.3 Evaluation of Definite Integrals. 275 

10.4 Problems. 290 

11 Complex Analysis: Advanced Topics 293 

11.1 Meromorphic Functions. 293 

11.2 Multivalued Functions. 295 

11.3 Analytic Continuation.•. 302 

11.4 The Gamma and Beta Functions. 309 

11.5 Method of Steepest Descent. 312 

11.6 Problems. 319 


IV Differential Equations 325 

12 Separation of Variables in Spherical Coordinates 327 

12.1 PDEs of Mathematical Physics.. . 327 

12.2 Separation of the Angular Part of the Laplacian. 331 

12.3 Construction of Eigenvalues of L 2 . 334 

12.4 Eigenvectors of L 2 : Spherical Harmonics. 338 

12.5 Problems. 346 































XVi CONTENTS 


13 Second-Order Linear Differential Equations 348 

13.1 General Properties of ODEs. 349 

13.2 Existence and Uniqueness for First-Order DEs. 350 

13.3 General Properties of SOLDEs. 352 

13.4 The Wronskian. 355 

13.5 Adjoint Differential Operators. 364 

13.6 Power-Series Solutions of SOLDEs. 367 

13.7 SOLDEs with Constant Coefficients . 376 

13.8 The WKB Method. 380 

13.9 Numerical Solutions of DEs. 383 

13.10 Problems. 394 

14 Complex Analysis of SOLDEs 400 

14.1 Analytic Properties of Complex DEs. 401 

14.2 Complex SOLDEs. 404 

14.3 Fuchsian Differential Equations. 410 

14.4 The Hypergeometric Function. 413 

14.5 Confluent Hypergeometric Functions. 419 

14.6 Problems. 426 

15 Integral Transforms and Differential Equations 433 

15.1 Integral Representation of the Hypergeometric Function . . . 434 

15.2 Integral Representation of the Confluent Hypergeometric 

Function. 437 

15.3 Integral Representation of Bessel Functions . 438 

15.4 Asymptotic Behavior of Bessel Functions. 443 

15.5 Problems. 445 

V Operators on Hilbert Spaces 449 

16 An Introduction to Operator Theory 451 

16.1 From Abstract to Integral and Differential Operators. 451 

16.2 Bounded Operators in Hilbert Spaces. 453 

16.3 Spectra of Linear Operators. 457 

16.4 Compact Sets. 458 

16.5 Compact Operators. 464 

16.6 Spectrum of Compact Operators . 467 

16.7 Spectral Theorem for Compact Operators .. 473 

16.8 Resolvents.■ : . 480 

16.9 Problems. 485 

17 Integral Equations 488 

17.1 Classification. 488 

































CONTENTS xvii 


17.2 Fredholm Integral Equations . 494 

17.3 Problems. 505 

18 Sturm-Liouvilie Systems: Formalism 507 

18.1 Unbounded Operators with Compact Resolvent. 507 

18.2 Sturm-Liouvilie Systems and SOLDEs.•,. 513 

18.3 Other Properties of Sturm-Liouvilie Systems. 517 

18.4 Problems. 522 

19 Sturm-Liouvilie Systems; Examples 524 

19.1 Expansions in Tenns of Eigenfunctions. 524 

19.2 Separation in Cartesian Coordinates. 526 

19.3 Separation in Cylindrical Coordinates. 535 

19.4 Separation in Spherical Coordinates. 540 

19.5 Problems. 545 

VI Green’s Functions 551 

20 Green 9 s Functions in One Dimension 553 〜 

20.1 Calculation of Some Green’s Functions.. . , 554 

20.2 Formal Considerations. 557 

20.3 Green’s Functions for SOLDOs. 565 

20.4 Eigenfunction Expansion of Green^ Functions. 577 

20.5 Problems. 580 

21 Multidimensional Green’s Functions: Formalism 583 

21.1 Properties of Partial Differential Equations. 584 

21.2 Multidimensional GFs and Delta Functions. 592 

21.3 Formal Development . 596 

21.4 Integral Equations and GFs. 600 

21.5 Perturbation Theory. 603 

21.6 Problems. ••… 610 

22 Multidimensional Green’s Functions: Applications 613 

22.1 Elliptic Equations. 、 . 613 

22.2 Parabolic Equations. 621 

22.3 Hyperbolic Equations. 626 

22.4 The Fourier Transform Technique. 628 

22.5 The Eigenfunction Expansion Technique. 636 

22.6 Problems. 641 































xviii CONTENTS 


VII Groups and Manifolds 649 

23 Group Theory 651 

23.1 Groups. 652 

23.2 Subgroups. 656 

23.3 Group Action. 663 

23.4 The Symmetric Group S n . 664 

23.5 Problems. 669 

24 Group Representation Theory 673 

24.1 Definitions and Examples. 673 

24.2 Orthogonality Properties. 680 

24.3 Analysis of Representations. 685 

24.4 Group Algebra. 687 

24.5 Relationship of Characters to Those of a Subgroup. 692 

24.6 Irreducible Basis Functions. 695 

24.7 Tensor Product of Representations . . 699 

24.8 Representations of the Symmetric Group. 707 

24.9 Problems. 723 

25 Algebra of Tensors 728 

25.1 Multilinear Mappings. 729 

25.2 Symmetries of Tensors . .. 736 

25.3 Exterior Algebra. : . 739 

25.4 Inner Product Revisited. 749 

25.5 The Hodge Star Operator. 756 

25.6 Problems. 758 

26 Analysis of Tensors 763 

26.1 Differentiable Manifolds. 763 

26.2 Curves and Tangent Vectors. 770 

26.3 Differential of a Map . 776 

26.4 Tensor Fields on Manifolds. 780 

26.5 Exterior Calculus . 791 

26.6 Symplectic Geometry. 801 

26.7 Problems. 808 

VIII Lie Groups and Their Applications 813 

27 Lie Groups and Lie Algebras 815 

27.1 Lie Groups and Their Algebras. 815 

27.2 An Outline of Lie Algebra Theory .. . • 833 

27.3 Representation of Compact Lie Groups. 845 

































CONTENTS xix 


Representation of the General Linear Group. 856 

Representation of Lie Algebras. 859 

Problems. 876 


28 Differential Geometry 882 

28.1 Vector Fields and Curvature. 883 

28.2 Riemannian Manifolds. 887 

28.3 Covariant Derivative and Geodesics. 897 

28.4 Isometries and Killing Vector Fields . 908 

28.5 Geodesic Deviation and Curvature. 913 

28.6 General Theory of Relativity. 918 

28.7 Problems. 932 

29 Lie Groups and Differential Equations 936 

29.1 Symmetries of Algebraic Equations. 936 

29.2 Symmetry Groups of Differential Equations. 941 

29.3 The Central Theorems. 951 

29.4 Application to Some Known PDEs. 956 

29.5 Application to ODEs. 964 

29.6 Problems. 970 

30 Calculus of Variations ， Symmetries，and Conseryation Laws 973 

30.1 The Calculus of Variations. 973 

30.2 Symmetry Groups of Variational Problems. 988 

30.3 Conservation Laws and Noether’s Theorem. 992 

30.4 Application to Classical Field Theory. 997 

30.5 Problems. 1000 

Bibliography 1003 

Index 1007 


. 4 . 5.6 
27 . 27 . 27 . 
























List of Symbols 


e ， ⑹ 

% 

R 

R+ 

C 

N 

Q 

〜 A 

AxB 

A n 

u,(n) 

A^B 
X f(x) 

V 

彐 

M 
go f 

iff 

&(a 9 b) 

C«(orE M ) 

V c [t] 

ym 

c°° 

(a\b) 


“belongs to”,（“does not belong to ”） 

Set of integers 

Set of real numbers 

Set of positive real numbers 

Set of complex numbers 

Set of nonnegative integers 

Set of rational numbers 

Complement of the set A 

Set of ordered pairs (a, b) with a € A and b e B 

{(dua 2 , ...,a n )\ai € A] 

Union, (Intersection) 

A is equivalent to B 
x is mapped to f(x) via the map / 
for all (values of) 

There exists (a value of) . 

Equivalence class to which a belongs 
Composition of maps / and 君 
if and only if 

Set of functions on (a, b) with continuous derivatives up to order k 

Set of complex (or real)«-tuples 

Set of polynomials in t with complex coefficients 

Set of polynomials in t with real coefficients 

Set of polynomials with complex coefficients of degree n or less 

Set of all complex sequences such that \ a i I 2 < oo 

Inner product of \a) and \b) 

Norm (length) of the vector \a) 



xxii LIST OF SYMBOLS 


L(V) 

[S,T] 

T 卞 

A f , or A 
II ㊉ V 
5 (jc *— xo) 

Res[/(zo)l 

DE, ODE, PDE 

SOLDE 

GL(V) 

GL{nX) 

SL{nX) 

T\ ® T2 
A AB 

AP(V) 


Set of endomorphisms (linear operators) on vector space V 

Commutator of operators S and T 

Adjoint (herraitian conjugate) of operator T 

Transpose of matrix A 

Direct sum of vector spaces U and V 

Dirac delta function nonvanishing only at x =xq 

Residue of / at point zo 

Differential equation, Ordinary DE, Partial DE 

Second order linear (ordinary) differential equation 

Set of all invertible operators on vector space V 

Set of all n x w complex matrices of nonzero determinant 

Set of all w x w complex matrices of unit determinant 

Tensor product of t\ and T 2 

Exterior (wedge) product of skew-symmetric tensors A and B 
Set of all skew-symmetric tensors of type (p, 0) on V 


Mathematical Preliminaries 


This introductory chapter gathers together some of the most basic tools and notions 
that are used throughout the book. It also introduces some common vocabulary 
and notations used in modem mathematical physics literature. Readers familiar 
with such concepts as sets, maps, equivalence relations, and metric spaces may 
wish to skip this chapter. 

0.1 Sets 

Modem mathematics starts with the basic (and undefinable) concept of set. We 
think of a set as a structureless family, or collection, of objects. We speak, for 
example, of the set of students in a college, of men in a city, of women working 
concept of set for a corporation, of vectors in space, of points in a plane, or of events in the 

elaborated continuum of space-time. Each member a of a set A is called an element of that 

set. This relation is denoted by a € A (read il a is an element of A n or belongs 
to A”），and its negation by a ^ A. Sometimes a is called a point of the set A to 
emphasize a geometric connotation. 

A set is usually designated by enumeration of its elements between braces. 
For example, {2, 4,6, 8} represents the set consisting of the first four even natural 

numbers; {0, ±1，士2, 士3, _} is the set of all integers; [l 9 x, x 2 , x 3 , • • •} is the 

set of all nonnegative powers of and {1, 一 1, 一 z'} is the set of the four complex 
fourth roots of unity. In many cases, a set is defined by a (mathematical) statement 
that holds for all of its elements. Such a set is generally denoted by {jc| P(;c)} and 
read “the set of all x’s such that P(x) is true.” The foregoing examples of sets can 
be written alternatively as follows: 

{« I n is even and 1 < « < 9} 


2 0. MATHEMATICAL PRELIMINARIES 


I n is a natural number} 

and n is a natural number} 

{z\z 4 = I and z is a complex number} 

In a frequently used shorthand notation, the last two sets can be abbreviated as 
I « > 0 and n is an integer} and [z E C \ = 1}. Similarly, the unit circle 

can be denoted by {z \z\ = 1}, the closed interval [a, b] as [x\a < x < b}, the 
open interval (a, b) sls {x \ a < x < b], and the set of all nonnegative powers of 
x as {x n }^_ 0 . This last notation will be used frequently in this book. A set with a 
singleton single element is called a singleton. 

If a € A whenever a e we say that 5 is a subset of A and write B c Aot 
A D If 5 C A and A c B, then A = B.li B C A and A ^ B, then B is called 
(proper) subset a proper subset of A. The set defined by {a\a ^ a) is called the empty set and 

empty set is denoted by 0. Clearly, 0 contains no elements and is a subset of any arbitrary 

set. The collection of all subsets (including 0) of a set A is denoted by 2 A . The 
reason for this notation is that the number of subsets of a set containing n elements 
is 2 n (Problem 0.1). If A and B are sets, their union, denoted by A U i5, is the set 
union, intersection, containing all elements that belong to A or 5 or both. The intersection of the sets 
complement 4 and fi，denoted by AD B, is the set containing all elements belonging to both 
A and B. If {Ba}^ is a collection of sets , 1 we denote their union by U a B a and 
their intersection by C\ a B a . 

universal set In any application of set theory there is an underlying universal set whose 
subsets are the objects of study. This universal set is usually clear from the context. 
For example, in the study of the properties of integers, the set of integers, denoted 
by Z, is the universal set. The set of reals, M, is the universal set in real analysis, 
and the set of complex numbers, C, is the universal set in complex analysis. With 
a universal set X in mind, one can write X ~ A instead of ~ A. The complement 
of a set A is denoted by 〜A and defined as 

〜 A 三 {a 1 a 癸 A}. 

The complement of 5 in A (or their difference) is 

A ~ 5 三 {a\a e A and a ^ B}. 

Cartesian product From two given sets A and B, it is possible to form the Cartesian product of A 
ordered pairs and B, denoted by Ax B, which is the set of ordered pairs (a,b), where a e A 
and b e B. This is expressed in set-theoretic notation as 

A x B — {(a, b)\a e A and b e B}. 


^ere / is an index setor a counting set — with its typical element denoted by a. In most cases, I is the set of (nonnegative) 
integers, but, in principle，it can be any set, for example, the set of real numbers. 


0.1 SETS 3 


relation and 
equivalence relation 


equivalence class 


representative of an 
equivalence class 


We can generalize this to an arbitrary number of sets. If A\, A 2 , ..., A rt are sets, 
then the Cartesian product of these sets is 

Ai x A 2 x * * • x A„ = {(fli, a 2 ,..., ^n)Wi € A/}, 

which is a set of ordered n-tuples. If Ai = A 2 = ■ • ■ — A, then we write 
A n instead of A x A x • ■ ■ x A, and 

A n = {(ai, a 2 ,..^a n ) \ at G A}. 

The most familiar example of a Cartesian product occurs when A — R. Then 
R 2 is the set of pairs (^ 1 , X 2 ) with x\ f X 2 e M. This is simply the points in the 
Euclidean plane. Similarly, R 3 is the set of triplets (xi, X 2 , or the points in 
space, and M! 1 = [(x\, X 2 ,.. ^ , x n )\xi e M] is the set of real n-tuples. 

0.1.1 Equivalence Relations 

There are many instances in which the elements of a set are naturally grouped 
together. Fox example, all vector potentials that differ by the gradient of a scalar 
function can be grouped together because they all give the same magnetic field. 
Similarly, all quantum state functions (of unit “length”）that differ by a multi¬ 
plicative complex number of unit length can be grouped together because they all 
represent the same physical state. The abstraction of these ideas is summarized in 
the following definition. 

0.1.1. Definition. Let A be a set. A relation on A is a comparison test between 
ordered pairs of elements of A. If the pair (a t b) € A x A pass this test，we write 
a>b and read “a is related to b:’ An equivalence relation on A is a relation that 
has the following properties: 

a>a V a e A, (reflexivity) 

a>b b>a a,b e A, (symmetry) 

a>b,b>c a>c a, b,c G A, (transivity) 

When a > b, we say that “a is equivalent to b!* The set [[a]] = {fe e A\b>a] of all 
elements that are equivalent to a is called the equivalence class of a. 

The reader may verify the following property of equivalence relations. 

0.1.2. Proposition. If > is an equivalence relation on A and a,b e A, then either 
M n = 0 or laj — ^bj. 

Therefore, a ! e [a]] implies that BVH = [[aj. In other words, any element of 
an equivalence class can be chosen to be a representative of that class. Because 
of the symmetry of equivalence relations, sometimes we denote them by ex. 




4 0. MATHEMATICAL PRELIMINARIES 


0.1.3. Example. Let A be the set of human beings. Let a>bhQ interpreted as tL a is older 
than b." Then clearly, > is a relation but not an equivalence relation. On the other hand, if 
we interpret a > & as u a and 办 have the same paternal grandfather，” then >is an equivalence 
relation, as the reader may check. The equivalence class of a is the set of all grandchildren 
of «’s paternal grandfather. 

Let V be the set of vector potentials. Write A > A r if A — A 7 = V / for some function 
f. The reader may verify that > is an equivalence relation, and that IAI is the set of all 
vector potentials giving rise to the same magnetic field. 

Let the underlying set be Z x ( 2 〜 {0})_ Say “(a ， 办） is related to (c, dy if ad = be. 
Then this relation is an equivalence relation. Furthermore, |[(a, 办 )J can be identified as the 
ratio a/6. 圃 

0.1.4. Definition. Let Abe a set and {B a }a collection of subsets of A. We say that 
partition of a set {^} is a partition of A, or {5 a } partitions A，if the B a ’s are disjoint, i.e., have 
no element in common, and U a B a = A, 

Now consider the collection {[[«]] |« € A} of all equivalence classes of A, 
quotient set These classes are disjoint, and evidently their union covers all of A. Therefore, 
the collection of equivalence classes of A is a partition of A, This collection is 
denoted by A/ xi and is called the quotient set of A under the equivalence relation 

IX. 

0.1.5. Example. Let the underlying set be M 3 . Define an equivalence relation on R 3 by 
saying that Pi e JR 3 and P 2 e M 3 are equivalent if they lie on the same line passing through 
the origin. Then M 3 / x is the set of all lines in space passing through the origin. If we 
choose the unit vector with positive third coordinate along a given line as the representative 
of that line, then E 3 / x can be identified with the upper unit hemisphere. 2 txis called 
projective space the projective space associated with E 3 . 

On the set % of integers define a relation by writing m > n for m, n € Z if m — n is 
divisible by k, where /: is a fixed integer. Then > is not only a relation, but an equivalence 
relation. In this case, we have 

如 > = {■ W ，…，呔 — II }， 

as the reader is urged to verify. 

For the equivalence relation defined onZ x ^ of Example 0.1.3, the set Z x S/ x can 
be identified with Q, the set of rational numbers. 团 


0.2 Maps 

To communicate between sets, one introduces the concept of a map. A map / 

map, domain, from a set X to a set Y, denoted by / : X F or X X 7, is a correspondence 

codomain, image between elements of X and those of 7 in which all the elements of X participate, 


2 Furthermore, we need to identify any two points on the edge of the hemisphere which lie on the same diameter. 




0.2 MAPS 5 


function 


identity map 


graph of a map 


preimage 



Figure 1 The map / maps all of the set X onto a subset of Y. The shaded area in F is 
f(X), the range of /. 


and each element of X corresponds to only one element of Y (see Figure 1). If 
y gY is the element that corresponds tox s X via the map /, we write 

y = f(x) or x f{x) or x y 

and call f(x) the image of x under /. Thus, by the definition of map, x G X can 
have only one image. The set X is called the domain, and Y the codomain or 
the target space. Two maps f : X ^ Y and g : X 7 are said to be equal if 
f(x) = g(x) for all x G X. 


0.2.1. Box. A map whose codomain is the set of real numbers E or the set 
of complex numbers C is commonly called a function. 


A special map that applies to all sets A is idA : A ^ A, called the identity 
map of A, and defined by 

idA(a) = a V a € A. 

The graph of a map / : A B is a subset of A x 5 defined by 
r/ = {(«, f(a))\aeA}cAxB. 

This general definition reduces to the ordinary graphs encountered in algebra and 
calculus where A = B = R and A x B is the -plane. If A is a subset of X, 
we call /(A) = {f(x)\x € A} the image of A. Similarly, if 5 C /(X), we call 
= [x € X\f(x) e 5} the inverse image, or preimage ， of 5. In words, 
consists of all elements in X whose images are in J5 C 7. If B consists 
of a single element b, then = {x e X\f(x) = b] consists of all elements 

of X that are mapped to b. Note that it is possible for many points of X to have 
the same image in Y. The subset f(X) of the codomain of a map / is called the 
range of / (see Figure 1). 





6 0. MATHEMATICAL PRELIMINARIES 


composition of two 
maps 


injection, surjection, 
and bijection, or 1-1 
correspondence 


inverse of a map 



8^f 


Figure 2 The composition of two maps is another map. 

If f \ X ^ Y and g : Y ^ W, then the mapping h \ X ^ W given 
by h(x) = g(f(x)) is called the composition of / and g, and is denoted by 
h = g o f (see Figure 2)_ 3 It is easy to verify that 

/ o idx = / = idy of 

If f(x\) = f(x 2 ) implies thatxi = X 2 , we call / injective, or one-to-one 
(denoted 1-1). For an injective map only one element of X corresponds to an 
element of Y. If f(X) = Y, the mapping is said to be surjective, ownto, A 
map that is both injective and surjective is said to be bijective, or to be a one-to- 
one correspondence. Two sets that are in one-to-one correspondence, have, by 
definition, the same number of elements. If f : X — 7 is a bijection from X 
onto 7, then for each y e Y there is one and only one element x in X for which 
f(x) = y. Thus, there is a mapping /— 1 : 7 X given by — x, where 

x is the unique element such that f(x) = y. This mapping is called the inverse 
of /. The inverse of / is also identified as the map that satisfies / o 厂 1 = idy 
and / 一 1 o / = idx* For example, one can easily verify that In -1 = exp and 
exp -1 = In, because ln^) = x and e hix = x. 

Given a map / : X 7, we can define a relation ix on X by saying rxj 
X 2 if f(x\) = f(x 2 ). The reader may check that this is in fact an equivalence 
relation. The equivalence classes are subsets of X all of whose elements map to 
the same point in 7. In fact, [[xj = f~ l (f(x)). Corresponding to /, there is a 
map / : X/\x^ Y given by = f(x). This map is injective because 

if /GUiI) = /([X 2 ])， then f(x\) = f(x 2 ), so x\ and x% belong to the same 
equivalence class; therefore, ^_x\J = lx〗]■ It follows that / : X/x^ f(X) is 
bijective. 

If / andg are both bijections with inverses / 一 1 and 君 — 1 ， respectively, then go/ 
also has an inverse, and verifying that (g o /) 一 1 = / -1 o g~ l is straightforward. 


3 Note the importance of the order in which the composition is written. The reverse order may not even exist. 




0.3 METRIC SPACES 


0*2.2. Example. As an example of the preimage of a set, consider the sine and cosine 
functions. Then it should be clear that 

sin -1 0 = cos^ 1 0 = +rjt 

l Z i n=—oo 

Similarly, sin 一 1 [0, 士 ] consists of all the intervals on the x-axis marked by heavy line 
segments in Figure 3, i.e., all the points whose sine lies between 0 and i. 

M 


injectivity and 
surjectivity depend 
on the domain and 
codomain 


unit circle 


binary operation 


As examples of maps, we consider functions / : IR M studied in calculus. 
The two functions / : E ^ IR and g : R — (—1 ， +1) given, respectively, by 
f(x) = jc 3 and g(x) = tanhjc are bijective. The latter function, by the way，shows 
that there are as many points in the whole real line as there are in the interval 
(—1, -f-1). If we denote the set of positive real numbers by R+，then the function 
/ : M ^ M+ given by f(x) = x 2 is surjective but not injective (both x and 
—x map to x 2 ). The function g : E+ — M given by the same rule, ^(x) = x 2 9 
is injective but not surjective. On the other hand, h : R+ again given by 

h(x) = x 2 is bijective, but m : R ^ M given by the same rule is neither injective 
nor surjective. 

Let denote the set ofnxn real matrices. Define a function det : M nxn 
R by det(A) = det A, where det A is the determinant of A for A € This func¬ 
tion is clearly surjective (why?) but not injective. The set of all matrices whose 
determinant is 1 is det -1 (l). Such matrices occur frequently in physical applica¬ 
tions. 

Another example of interest is / : C — E given by f(z) = |z|. This function 
is also neither injective nor surjective. Here / _1 (1) is the unit circle, the circle 
of radius 1 in the complex plane. 

The domain of a map can be a Cartesian product of a set，asin / : XxX 7. 
Two specific cases are worthy of mention. The first is when 7 = M. An example 
of this case is the dot product on vectors. Thus, if X is the set of vectors in space, 
we can define /(a, b) = a • b. The second case is when Y = X ‘ Then / is 
called a binary operation on X ， whereby an element in X is associated with two 
elements in X, For instance, let X = Z, the set of all integers; then the function 
/ : Z xZ —Z defined by f(m, n) = mn is the binary operation of multiplication 
of integers. Similarly, g : E x R — E given by g(x, y )= 文 + is the binary 
operation of addition of real numbers. 


0.3 Metric Spaces 

Although sets are at the root of modem mathematics, they are only of formal and 
abstract interest by themselves. To make sets useful, it is necessary to introduce 
some structures on them. There are two general procedures for the implementa¬ 
tion of such structures. These are the abstractions of the two major branches of 
mathematics — algebra and analysis. 


8 0. MATHEMATICAL PRELIMINARIES 



Figure 3 The union of all the intervals on the jc-axis marked by heavy line segments is 
sin-^O, \l 

We can turn a set into an algebraic structure by introducing a binary operation 
on it. For example, a vector space consists, among other things, of the binary 
operation of vector addition. A group is, among other things, a set together with the 
binary operation of “multiplication’’. There are many other examples of algebraic 
systems，and they constitute the rich subject of algebra. 

When analysis, the other branch of mathematics, is abstracted using the concept 
of sets, it leads to topology, in which the concept of continuity plays a central role. 
This is also a rich subject with far-reaching implications and applications. We shall 
not go into any details of these two areas of mathematics. Although some algebraic 
systems will be discussed and the ideas of limit and continuity will be used in the 
sequel, this will be done in an intuitive fashion, by introducing and employing the 
concepts when they are needed. On the other hand, some general concepts will 
be introduced when they require minimum prerequisites. One of these is a metric 
space: 

0.3.1. Definition. A metric space is a set X together with a real-valued function 
metric space defined d : X x X such that 

(a) d(x 9 y)>0 V x ， y ， andd(x, y) = 0 iffx = y. 

(b) d(x 9 y) = d(y, x). (symmetry) 

(c) d(x, y) < d(x, z) + d(z, y)- (the triangle inequality) 

It is worthwhile to point out that X is a completely arbitrary set and needs 
no other structure. In this respect Definition 0.3.1 is very broad and encompasses 
many different situations, as the following examples will show. Before examining 
the examples, note that the function d defined above is the abstraction of the notion 
of distance: (a) says that the distance between any two points is always nonnegative 
and is zero only if the two points coincide; (b) says that the distance between two 
points does not change if the two points are interchanged; (c) states the known fact 




sequence defined 


convergence defined 


0.3 METRIC SPACES 


that the sum of the lengths of two sides of a triangle is always greater than or equal 
to the length of the third side. Now consider these examples: 

1. Let X = Q，the set of rational numbers, and define d(x, y) = |x — y\. 

2. Let X = R, and again define d(x, y) = \x — y|. 

3. Let X consist of the points on the surface of a sphere. We can define two 
distance functions on X. Let d\ (P, Q) be the length of the chord joining P 
and Q on the sphere. We can also define another metric, 办 ( 尸， 2)，as the 
length of the arc of the great circle passing through points P and Q on the 
surface of the sphere. It is not hard to convince oneself that d\ and ^2 satisfy 
all the properties of a metric function. 

4. Let S°[a, b] denote the set of continuous real-valued functions on the closed 
interval [a, b]. We can define ^(/, g) = fa 1/( 文）一 g{^)\dx for /, g e 

5. Let Cj?(a, b) denote the set of bounded continuous real-valued functions on 
the dosed interval [a, b]. We then define 

d(f ， g)= max (|/(x) - g(x)\] 

xe[a,b] 

for /, g g Cb(a, b). This notation says: Take the absolute value of the 
difference in / and g at all in the interval [a, b] and then pick the maximum 
of all these values. 

The metric function creates a natural setting in which to test the “closeness” 
of points in a metric space. One occasion on which the idea of closeness becomes 
essential is in the study of a sequence. A sequence is a mapping s :N ^ X from 
the set of natural numbers N into the metric space X. Such a mapping associates 
with a positive integer n a points (n) of the metric space X. It is customary to write 
s n (or x n to match the symbol X) instead of ^(n) and to enumerate the values of 
the function by writing 

Knowledge of the behavior of a sequence for large values of n is of fundamental 
importance. In particular, it is important to know whether a sequence approaches 
a finite value as n increases. 


0.3.2. Box. Suppose thatfor somex andfor any positive real number €， there 
exists a natural number N such that d(x n ,x) < € whenever n > N, Then we 
say that the sequence converges to x and write d(x n , x)= 

0 or d(x n , x) ^ Oor simply x n x. 


It may not be possible to test directly for the convergence of a given sequence 
because this requires a knowledge of the limit point x. However, it is possible to 




0. MATHEMATICAL PRELIMINARIES 


Figure 4 The distance between the elements of a Cauchy sequence gets smaller and 
smaller. 

ff 

do the next best thing — to see whether the points of the sequence get closer and 
Cauchy sequence closer as n gets larger and larger. A Cauchy sequence is a sequence for which 
lim m ,„_^oo d(x m , = 0, as shown in Figure 4. We can test directly whether 
or not a sequence is Cauchy. However, the fact that a sequence is Cauchy does 
not guarantee that it converges. For example，let the metric space be the set of 
rational numbers Q with the metric function d(x, y) = \x — y\, and consider the 
sequence where x n = 1) 无 + 1 / 灸 .It is clear that x n is a rational 

number for any n. Also, to show that \x m — x n \ ^ 0 is an exercise in calculus. 
Thus, the sequence is Cauchy. However, it is probably known to the reader that 
lim n ^oo x n = ln2, which is not a rational number. 

A metric space in which every Cauchy sequence converges is called a complete 
complete metric metric space. Complete metric spaces play a crucial role in modem analysis. The 
space preceding example shows that Q is not a complete metric space. However, if the 
limit points of all Cauchy sequences are added to Q, the resulting space becomes 
complete. This complete space is, of course, the real number system R. It turns out 
that any incomplete metric space can be “enlarged” to a complete metric space. 

0.4 Cardinality 

The process of counting is a one-to-one comparison of qne set with another. If two 
cardinality sets are in one-to-one correspondence, they are said to have the same cardinality. 

Two sets with the same cardinality essentially have the same “number” of elements. 
The SQtF n = {1,2,is finite and has cardinality n. Any set from which there 
is a bijection onto F n is said to be finite with n elements. 



Although some steps had been taken before him in the direction of a definitive theory of 
sets, the creator of the theory of sets is considered to be Georg Cantor (1845—1918)，who 
was bom in Russia of Danish-Jewish parentage but moved to Germany with his parents. 



0.4 CARDINALITY 11 


His father urged him to study engineering, and Cantor entered the University of Berlin in 
1863 with that intention. There he came under the influence of Weierstrass and turned to 
pure mathematics. He became Privatdozent at Halle in 1869 and professor in 1879. When 
he was twenty-nine he published his first revolutionary paper on the theory of infinite sets in 
the Journal fur Mathematik. Although some of its propositions were deemed faulty by the 
older mathematicians, its overall originality and brilliance attracted attention. He continued 
to publish papers on the theory of sets and on transfinite numbers until 1897. 

One of Cantor’s main concerns was to differentiate among 
infinite sets by “size” and, like Bolzano before him, he decided 
that one-to-one correspondence should be the basic principle. 

In his correspondence with Dedekind in 1873, Cantor posed the 
question of whether the set of real numbers can be put into one- 
to-one eonrespondence with the integers, and some weeks later 
he answered in the negative. He gave two proofs. The first is more 
complicated than the second, which is the one most often used 
today. In 1874 Cantor occupied himself with the equivalence of 
the points of a line and the points of and sought to prove 
that a one-to-one coirespondence between these two sets was 
impossible. Three years later he proved that there is such a correspondence. He wrote to 
Dedekind, “I see it but I do not believe it.” He later showed that given any set, it is always 
possible to create a new set, the set of subsets of the given set, whose cardinal number is 
larger than that of the given set. If Kq the given set, then the cardinal number of the set 
of subsets is denoted by 2 叫 • Cantor proved that 2% = c，where c is the cardinal number 
of the continuum; i.e” the set of real numbers. 

Cantor’s work, which resolved age-old problems and reversed much previous thought, 
could hardly be expected to receive immediate acceptance. His ideas on transfinite ordi¬ 
nal and cardinal numbers aroused the hostility of the powerful Leopold Kronecker, who 
attacked Cantor’s theory savagely over more than a decade, repeatedly preventing Cantor 
from obtaining a more prominent appointment in Berlin. Though Kronecker died in 1891, 
his attacks left mathematicians suspicious of Cantor’s work. Poincare referred to set theory 
as an interesting ‘"pathological case.” He also predicted that “Later generations will regard 
[Cantor’s] Mengenlehre as a disease from which one has recovered.” At one time Cantor 
suffered a nervous breakdown, but resumed work in 1887. 

Many prominent mathematicians, however, were impressed by the uses to which the new 
theory had already been, put in analysis, measure theory, and topology. Hilbert spread Cantor’s 
ideas in Germany, and in 1926 said, “No one shall expel us from the paradise which Cantor 
created for us.” He praised Cantor’s transfinite arithmetic as “the most astonishing product 
of mathematical thought, one of the most beautiful realizations of human activity in the 
domain of the purely intelligible.” Bertrand Russell described Cantor’s work as “probably the 
greatest of which the age can boast.” The subsequent utility of Cantor’s work in formalizing 
mathematics 一 a movement largely led by Hilbert — seems at odds with Cantor’s Platonic 
view that the greater importance of his work was m its implications for metaphysics and 
theology. That his work could be so seamlessly diverted from the goals intended by its 
creator is strong testimony to its objectivity and craftsmanship. 






12 0. MATHEMATICAL PRELIMINARIES 


countably infinite 


uncountable sets 


Cantor set 
constructed 


Now consider the set of natural numbers N = {1, 2, 3,...If there exists a 
bijection between a set A and N, then A is said to be countably infinite. Some 
examples of countably infinite sets are the set of all integers, the set of even natural 
numbers, the set of odd natural numbers, the set of all prime numbers, and the set 
of energy levels of the bound states of a hydrogen atom. 

It may seem surprising that a subset (such as the set of all even numbers) 
can be put into one-to-one correspondence with the full set (the set of all natural 
numbers); however，this is a property shared by all infinite sets. In fact, sometimes 
infinite sets are defined as those sets that are in one-to-one correspondence with at 
least one of their proper subsets. It is also astonishing to discover that there areas 
many rational numbers as there are natural numbers. After all, there are infinitely 
many rational numbers just in the interval (0, 1) ― or between any two distinct real 
numbers. 

Sets that are neither finite nor countably infinite are said to be uncountable. In 
some sense they are “more infinite” than any countable set. Examples of uncount¬ 
able sets are the points in the interval ( 一 1 ， +1)，the real numbers, the points in a 
plane, and the points in space. It can be shown that these sets have the same cardinal¬ 
ity: There areas many points in three-dimensional space ― the whole universe — as 
there are in the interval (—1, +1) or in any other finite interval. 

Cardinality is a very intricate mathematical notion with many surprising results. 
Consider the interval [0,1]. Remove the open interval (|, |) from its middle. This 
means that the points | and | will not be removed. From the remaining portion, 
[0, U [|, 1], remove the two middle thirds; the remaining portion will then be 

[0, U [|, U [|, U [|, 1] 


(see Figure 5). Do this indefinitely. What is the cardinality of the remaining set, 
which is called the Cantor set? Intuitively we expect hardly anything to be left. 
We might persuade ourselves into accepting the fact that the number of points 
remaining is at most infinite but countable. The surprising fact is that the cardinality 
is that of the continuum! Thus, after removal of infinitely many middle thirds, the 
set that remains has as many points as the original set! 


0.5 Mathematical Induction 

Many a time it is desirable to make a mathematical statement that is true for all 
natural numbers. For example, we may want to establish a formula involving an 
integer parameter that will hold for all positive integers. One encounters this situa¬ 
tion when, after experimenting with the first few positive integers, one recognizes 
a pattern and discovers a formula, and wants to make sure that the formula holds 
for all natural numbers. For this purpose, one uses mathematical induction. The 
induction principle essence of mathematical induction is stated as follows: 


0.5 MATHEMATICAL INDUCTION 13 


binomial theorem 


0 

1 


2 - - - 

4 ™ ™ ~ ___■ JBIto ■晒 _ 

Figure 5 The Cantor set after one, two, three, and four “dissections.” 


0*5.1. Box. Suppose that there is associated with each natural number (pos¬ 
itive integer) n a statement S n . Then S n is true for every positive integer 
provided the following two conditions hold: 

L S\ is true. 

2. If S m is true for some given positive integer m, 
then S m +i is also true. 


We illustrate the use of mathematical induction by proving the binomial the¬ 
orem: 


m / \ m 

+ \a m - k b k == ^ 

k=07 k=0 


ml 


k=0 


k\(m — fe)! 


a 


m- 


- k b k 


+ ma m - l b+ 咖 ; l \ m ~ 2 b 2 + … + mab m - 1 + b m , 


⑴ 


where we have used the notation 


(:) 


m\ 


k\(jn — k)\ 


( 2 ) 


The mathematical statement S m is Equation (1). We note that S\ is trivially true. 
Now we assume that S m is true and show that S m+ \ is also true. This means starting 
with Equation (1) and showing that 


m+l 

(«+fo) m+i = 

k=：0 


+ 1 
k 


\a m+1 - k b k . 



14 0. MATHEMATICAL PRELIMINARIES 


inductive definitions 


Then the induction principle ensures that the statement (equation) holds for all 
positive integers. 

Multiply both sides of Equation (l)by a + ^ to obtain 

(… ) m+1 = 忌 (:) a “ +1 沪 + g ⑺，， +1 . 


Now separate the Ic = 0 term from the first sum and th& k = m term from the 
second sum: 


(a + 6) m+1 = a m+1 




m-k^k+l + 办 m+1 


a 


m+l 


let/: = j — 1 in this sum 
m / m 




The second sum in the last line involves j. Since this is a dummy index, we can 
substitute any symbol we please. The choice k is especially useful because then 
we can unite the two summations. This gives 


(… 严 1 = ’ +1 +£{(:) + O } a^b k + b^\ 


If we now use 



which the reader can easily verify, we finally obtain 


k=0 \ k J 

Mathematical induction is also used in defining quantities involving integers. 
Such definitions are called inductive definitions. For example, inductive definition 
is used in defining powers: a 1 = a and a m = a m ~ l a. 


(fl + b) m+l = + 



0.6 Problems 

0 丄 Show that the number of subsets of a set containing n elements is 2 n . 


0.6 PROBLEMS 15 


Leibniz rule 


0.2. Let A, 5, and C be sets in a universal set U. Show that 

(a) A c B and B CC implies A cC. 

(b) A c Biff AH^ = Aiff AU5 = 5. 

(c) A C B and B cC implies (A U 5) C C. 

(d) A U 5 = (A 〜 5) U (A n J5) U (B 〜 A). 

Hint: To show the equality of two sets, show that each set is a subset of the other. 
0.3. For each n eN, let 


/„ = |x 11^: — 1| < n and |义 + 1| > -} • 

Find U n / W and Pi„/ n . 

0.4. Show that a! e Ja]] implies that [V]] = I«]|. 

0.5. Can you define a binary operation of “multiplication” on the set of vectors in 
space? What about vectors in the plane? 

0.6. Show that (/ o g) 一 1 = g 一 1 o / 一 1 when / and g are both bijections. 

0.7, Take any two open intervals (a, b) and (c, d), and show that there areas many 
points in the first as there are in the second, regardless of the size of the intervals. 
Hint: Find a (linear) algebraic relation between points of the two intervals. 

0«8_ Use mathematical induction to derive the Leibniz rule for differentiating a 
product: 


d n 


dx 


« (/ • s) 


(n\ d k f d n ~ k g 
\k) dx k dx n ~ k 


0.9. Use mathematical induction to derive the following results: 

k=0 r 1 k=0 Z 


Additional Reading 

1. Halmos, P. Naive Set Theory, Springer-Verlag, 1974. A classic text on intu¬ 
itive (as opposed to axiomatic) set theory covering all the topics discussed 
in this chapter and much more. 

2. Kelley ， J. General Topology, Springer-Verlag, 1985. The introductory chap¬ 
ter of this classic reference is a detailed introduction to set theory and map¬ 
pings. 

3. Simmons, G. Introduction to Topology and Modern Analysis, Krieger, 1983. 
The first chapter of this book covers not only set theory and mappings, but 
also the Cantor set and the fact that integers are as abundant as rational 
numbers. 



Finite-Dimensional Vector Spaces 




Vectors and Transformations 


Two- and three-dimensional vectors~undoubtedly familiar objects to the reader — 
can easily be generalized to higher dimensions. Representing vectors by their 
components, one can conceive of vectors having N components. This is the most 
immediate generalization of vectors in the plane and in space, and such vectors 
are called iV-dimensional Cartesian vectors. Cartesian vectors are limited in two 
respects: Their components are real, and their dimensionality is finite. Some ap¬ 
plications in physics require the removal of one or both of these limitations. It is 
therefore convenient to study vectors stripped of any dimensionality or reality of 
components. Such properties become consequences of more fundamental defini¬ 
tions. Although we will be concentrating on finite-dimensional vector spaces in 
this part of the book, many of the concepts and examples introduced here apply to 
infinite-dimensional spaces as well. 

1.1 Vector Spaces 

Let us begin with the definition of an abstract (complex) vector space. 1 

1.1.1. Definition. A vector space V over C is a set of objects denoted by \a) t \b) t 
vector space defined \x), and so on, called vectors, with the following properties: 2 

1. To every pair of vectors \a) and \b) in V there corresponds a vector \a}^\b), 
also in V, called the sumo/|a) and \b) t such that 

(a) |a) + \b) = |^) + \a) J 


1 Keep in mind that C is the set of complex numbers and R the set of reals. 

The bra, {|, and ket, | >, notation for vectors, invented by Dirac, is very useful when dealing with complex vector spaces. 
However, it is somewhat clumsy for certain topics such as norm and metrics and will therefore be abandoned in those discussions. 



20 1. VECTORS AND TRANSFORMATIONS 


(b) \a) + (\b) + \c)) — (\a) + [&)) -f |c), 

(c) There exists a unique vector jO) € V, called the zero vector, such that 
\a) + |0) — \a) for every vector \a), 

(d) To every vector \a) € V there corresponds a unique vector — \a) (also 
in V) such that \a) + (— |a)) = |0). 


scalars are numbers 2. To every complex number 3 a—also called a scalar—and every vector \a) 

there corresponds a vector a \a) in V such that 

(a) a(p \a)) = (ap) \a) t 

(b) 1 \a) = \a). 


complex vs. real 
vector space 

concept of field 
summarized 


3. Multiplication involving vectors and scalars is distributive: 

(a) a(\a) + |i?)) = a \a) +a\b). 

(b) (a + P) \a) = a \a) + p\a). 

The vector space defined above is also called a complex vector space. It is 
possible to replace C with M — the set of real numbers — in which case the resulting 
space will be called a real vector space. Real and complex numbers are prototypes 
of a mathematical structure called field. A field is a set of objects with two binary 
operations called addition and multiplication. Each operation distributes with re¬ 
spect to the other, and each operation has an identity. The identity for addition is 
denoted by 0 and is called additive identity. The identity for multiplication is de¬ 
noted by 1 and is called multiplicative identity. Furthermore, every element has an 
additive inverse, and every element except the additive identity has a multiplicative 
inverse. 


1.1.2. Example, some vector spaces 

1. IRis a vector space over the field of real numbers. 

2. € is a vector space over the field of real numbers. 

3. C is a vector space over the complex numbers. 

4. Let V = M and let the field of scalars be €. This is not a. vector space, because property 
2 of Definition 1.1.1 is not satisfied: A complex number times a real number is not 
a real number and therefore does not belong to V. 

5. The set of “arrows” in the plane (or in space) forms a vector space over 1R under the 
parallelogram law of addition of planar (or spatial) vectors. 


3 Complex numbers, particularly when they are treated as variables, are usually denoted by z, and we shall adhere to this 
convention in Part HI. However, in the discussion of vector spaces，we have found it more convenient to use lower case Greek 
letters to denote complex numbers as scalars. 


1.1 VECTOR SPACES 21 


6. Let IP C |>] be the set of all polynomials with complex coefficients in a variable t. 
Then ^[t] is a vector space under the ordinary addition of polynomials and the 
multiplication of a polynomial by a complex number. In this case the zero vector is 
the zero polynomial. 

7_ For a given positive integer n, let be the set of ail polynomials with complex 
coefficients of degree less than or equal to n. Again it is easy to verify that ^[t] 
is a vector space under the usual addition of polynomials and their multiplication 
by complex scalars. In particular, the sum of two polynomials of degree less than 
or equal to n is also a polynomial of degree less than or equal to n, and multiply¬ 
ing a polynomial with complex coefficients by a complex number gives another 
polynomial of the same type. Here the zero polynomial is the zero vector. 

8. The set CP^[r] of polynomials of degree less than or equal to n with real coefficients 
is a vector space over the reals, but it is not a. vector space over the complex numbers. 

9. Let C n consist of all complex n-tuples such as |a) = (aj, o ： 2 ,..., dn) and \b )== 
(fih h ， … ， fin). Let a be a complex number. Then we define 

|a> + \b) = (ai + o?2 + ft ， … ， of n + fti )， 

a \o) = (aax, ⑽之， … ，⑽ /I )， 

| 0 ) =( 0 , 0 ,..., 0 ), 

- \a) = (-ai ， 一《 2 ,… ， -a«). 


/7-dimensional 
complex coordinate 
space 

^-dimensional real 
coordinate space, or 
Cartesian /7-space 


It is easy to verify that is a vector space over the complex numbers. It is called 
the n-dimensional complex coordinate space. 

10. The set of all real n-tuples is a vector space over the real numbers under the 
operations similar to that of C n . It is called the n-dimensional real coordinate space, 
or Cartesian n-space. It is not a vector space over the complex numbers. 

11. The set of all complex matrices with 汛 rows and « columns M mxn is a vector space 
under the usual addition of matrices and multiplication by complex numbers. The 
zero vector is the m x « matrix with all entries equal to zero. 

12. Let €°° be the set of all complex sequences \a) = such that \ a i I 2 < 

oo. One can show that with addition and scalar multiplication defined component¬ 
wise, C 00 is a vector space over the complex numbers. 

13. The set of all complex-valued functions of a single real variable that are continuous 
in the real interval (a, b) is a vector space over the complex numbers. 

14. The set (a, b) on (a, b) of all real-valued functions of a single real variable that 
possess continuous derivatives of all orders up to n forms a vector space over the 
reals. 

15. The set C°°(a, b) of all real-valued functions on (a, b) of a single real variable that 

possess derivatives of ail orders forms a vector space over the reals. 國 


It is clear from the example above that the existence of a vector space depends 
as much on the nature of the vectors as on the nature of the scalars. 

linear independence 1.1.3. Definition. The vectors \a \), |« 2 ) »■ ■ ■» Wn)? are said to be linearly inde 瓣 
defined pendent if for or,* e C, the relation ^ \ai) =： 0 implies oti = Q for all i. The 

linear combination of sum Y^i=i a i \ a i) ^ called a linear combination of{\ai)}f =l . 
vectors 一 


22 1. VECTORS AND TRANSFORMATIONS 


subspace 1.1A Definition. A subspace W of a vector space V is a nonempty subset ofV 
with the property that if\a ), \b) G W, then a\a) ^ \b) also belongs to W for all 


The intersection of 
two subspaces is 
also a subspace. 

span of a subset of a 
vector space 


A subspace is a vector space in its own right. The reader may verify that the 
intersection of two subspaces is also a subspace. 

1.1.5. Theorem. If S is any nonempty set of vectors in a vector space V, then the 
set Ws of all linear combinations of vectors in S is a subspace ofV. We say that 
Ws is the span of S, or that S spans or that is spanned by S. Ws is 
sometimes denoted by Span{5}. 


The proof of Theorem 1.1.5 is left as Problem 1.8. 

basis defined 1.1,6. Definition. A basis of a vector space V is a set B of linearly independent 
vectors that spans all ofV. A vector space that has a finite basis is called finite- 
dimensional; otherwise，it is infinite-dimensional. 


We state the following without proof (see [Axle 96, page 31]): 

1SJ7* Theorem. AH bases of a given finite-dimensional vector space have the 
same number of linearly independent vectors. This number is called the dimension 
of the vector space. A vector space of dimension N is sometimes denoted by 

If \a) is a vector in an ^/-dimensional vector space V and 5 : =a basis 
in that space, then by the definition of a basis, there exists a unique (see Problem 
1.4) set of scalars {ofi, 0 : 2 ,, ce n } such that \a) = 0 ^ |a/〉. The set {a?j}^ =1 

components of a is called the components of \a) with respect to the basis B. 
vector in a basis 

1.1.8. Example. The following are subspaces of some of the vector spaces considered in 
Example 1.1.2. 

o The “space” of real numbers is a subspace of € over the reals. 

• Mis not a subspace of C over the complex numbers, because as explained in Example 
1 .1.2, M cannot be a vector space over the complex numbers. 

• The set of all vectors along a given line going through the origin is a subspace of 
arrows in the plane (or space) over M. 

o yfj [f] is a subspace of y c [t], 

• C rt_1 is a subspace of C n when C w_1 is identified with all complex n-tuples with 
zero last entry. In general, C m is a subspace of € n for m <n when C m is identified 
with all ^-tuples whose last n — m elements are zero. 

a !M r XlS is a subspace of M m xn for r < m ors < n. Here, we identify anr x 5 matrix 
with an m x « matrix whose last m — r rows and n — s columns are all zero. 

o [f] is a subspace of [/] for m < n. 

o y r m [f] is a subspace of ^ [t] for m < n. Note that both 7^ [f] and ^ n [t] are vector 
spaces over the reals only. 


1.2 INNER PRODUCT 23 


standard basis of C n 


o R m is a subspace ofE w form < n. Therefore, M 2 , the plane, is a subspace of M 3 , the 
Euclidean space. Also, R 1 = Risa subspace of both the plane R 2 and the Euclidean 
space M 3 . _ 

1.1_9* Example. The following are bases for the vector spaces given in Example 1.1.2. 

« The number 1 is a basis for IR, which is therefore one-dimensional. 

® The numbers 1 and i = V—T are basis vectors for the vector space € over R. Thus, 
this space is two-dimensional. 

® The number 1 is a basis for C over C, and the space is one-dimensional. Note that 
although the vectors are the same as in the preceding item, changing the nature of 
the scalars changes the dimensionality of the space. 

• The set {e x , e y , e^} of the unit vectors in the directions of the three axes forms a 
basis in space. The space is three-dimensional. 

• A basis of ? c [r] can be formed by the monomials 1, f 2 ,.... It is clear that this 

space is infinite-dimensional. 

• A basis of C n is given by ei ， § 2 ,..., e«, where ej is an «-tupie that has a 1 at the 
yth position and zeros everywhere else. This basis is called the standard basis of 
C 1 . Clearly, the space has n dimensions. 

® A basis of M mxn is given by en ， © 12 , ■. • ， e i_/，• ■ - ， e mn , where e/y is the m x n 
matrix with zeros everywhere except at the intersection of thez'th row and yth column, 
where it has a one. 

• A set consisting of the monomials 1, t,t 2 , t n forms a basis of Thus, this 
space is (n 4 - l)-dimensional. 

o The standard basis of C n is a basis of as well. It is also called the standard basis 
of Thus, R tl is «-dimensional. 

• If we assume that a <0 < then the set of monomials 1, forms a basis 

for C°°(a, b), because, by Taylor’s theorem, any function belonging to G°°(a, b) 
can be expanded in an infinite power series about a: = 0. Thus, this space is infinite- 
dimensional. ® 


Given a space V with a basis B = {|«f)}f =1 , the span of any m vectors (m < n) 
of B is an m-dimensional subspace of V, 


1.2 Inner Product 

A vector space，as given by Definition 1.1.1 ， is too general and structureless to be 
of much physical interest. One useful structure introduced on a vector space is a 
scalar product. Recall that the scalar (dot) product of vectors in the plane or in space 
is a rule that associates with two vectors a and b, a real number. This association, 
denoted symbolically by g : V x V ^ M, with 犮 (a, b) = a . b ， is symmetric: 
裒 (a ， b)= 茗 (b ， a), is linear in the first (and by symmetry, in the second) factor: 4 

g(aa + j8b, c) = ug(a, c) + j0g(b, c) or (aa + /3b) ■ c = ora ■ c + j8b ■ c, 


4 A fiinction that is linear in both of its arguments is called a bilinear function. 



24 1. VECTORS AND TRANSFORMATIONS 


gives the “length” of a vector: ja| 2 = g(a ， a) = a ■ a > 0, and ensures that the 
only vector with zero length 5 is the zero vector: g(a, a) = 0 if and only if a = 0. 

We want to generalize these properties to abstract vector spaces for which the 
scalars are complex numbers. A verbatim generalization of the foregoing proper¬ 
ties, however, leads to a contradiction. Using the linearity in both arguments and 
a nonzero \a), we obtain 

g(i \a ), i |a» = i 2 g{\a ), \a)) = —g(|a 〉 ， |a>). (1.1) 

Either the right-hand side (RHS) or left-hand side (LHS) of this equation must be 
negative! But this is inconsistent with the positivity of the “length” of a vector, 
which requires g(|a 〉 ， |a)) to be positive for all nonzero vectors, including i |a>. 
The source of the problem is the linearity in both arguments. If we can change this 
property in such a way that one of the i ? s in Equation (1,1) comes out complex- 
conjugated, the problem may go away. This requires linearity in one argument 
and complex-conjugate linearity in the other. Which argument is to be complex- 
conjugate linear is a matter of convention. We choose the first argument to be so. 6 
We thus have 


Dirac "bra," {i, and 
“kef'", notation is 
used for inner 
products. 


g(ct ]a) + |fe), |c)) = <x*g(\a) , |c)) + , |c)), 

where a* denotes the complex conjugate. Consistency then requires us to change 
the symmetry property as well. In fact, we must demand that g(ja), |ft>)= 
(g(\b) , |a»)*, from which the reality of g(\a) , \a)) — a necessary condition for its 
positivity — follows immediately. 

The question of the existence of an inner product on a vector space is a deep 
problem in higher analysis. Generally, if an inner product exists, there may be 
many ways to introduce one on a vector space. However, as we shall see in Section 
1.2.4, di finite-dimensional vector space always has an inner product and this inner 
product is unique. 7 So, for all practical purposes we can speak of the inner product 
on a finite-dimensional vector space, and as with the two- and three-dimensional 
cases, we can omit the letter g and use a notation that involves only the vectors. 
There are several such notations in use, but the one that will be employed in this 
book is the Dirac bra(c)ket notation ， whereby g(|a> ， | 办 》） is denoted by {a\b). 
Using this notation, we have 


1.2*1. Definition, The inner product of two vectors, \a) and \b), in a vector space 
Inner product defined V is a complex number ， {a\b) e C, such that 

L (alb) = (b\a)* 


2. (a\(^ \b)+y\c)) = p{a\b)^y{a\c) 


5 In our present discussion, we are avoiding situations in which a nonzero vector can have zero “length.” Such occasions arise 
in relativity, and we shall discuss them in Part VIL 

6 In some books, particularly in the mathematical literature, the second argument is chosen to be linear. 

7 This uniqueness holds up to a certain equivalence of inner products that we shall not get into here. 


1.2 INNER PRODUCT 


3. (a\a) > 0, and a〉= 0 if and only if \a) = |0). 

positive definite, or The last relation is called the positive definite property of the inner product^ A pos- 
Riemanruan Inner itive definite inner product is also called a Riemannian inner product, otherwise 
product it i s ca n e dpseudo-Riemannian . 

Note that linearity in the first argument is absent ， because，as explained earlier, 
it would be inconsistent with the first property, which expresses the “symmetry” 
of the inner product. The extra operation of complex conjugation renders the true 
linearity in the second argument impossible. Because of this complex conjugation, 
the inner product on a complex vector space is not truly bilinear; it is commonly 
sesquilinear called sesquilinear. 

A shorthand notation will be useful when dealing with the inner product of a 
linear combination of vectors. 

1*2.2. Box. We write the LHS of the second equation in the definition above 
as (a\fib + yc). 

This has the advantage of treating a linear combination as a single vector. The 
second property then states that if the complex scalars happen to be in a ket, they 
“split out” unaffected: 

+ yc) = fi(a\ b)-\-y(a\c). (1.2) 

On the other hand, if the complex scalars happen to be in the first factor (the bra), 
then they should be conjugated when they are “split out ”： 

+ yc\a) =^(b\a)+Y^ {c\a). (1.3) 

A vector space V on which an inner product is defined is called an inner 
product space. As mentioned above, all finite-dimensional vector spaces can be 
turned into inner product spaces. 

1.2.3. Example. In this example we introduce some of the most common inner products. 
The reader is urged to verify that in all cases, we indeed have an inner product. 

• Let \a), \b) € £ n f with \a) = (ai, o ； 2 ,..., and \b) = ( 別， fc ， …， Ai), and 
natural inner product define an inner product on C rt as 

for€ n 

n 

i=l 

That this product satisfies all the required properties of an inner product is easily 
checked. For example, if \b) = |«>, we obtain {a\a) — |«il 2 + |a 2 l 2 + • • * + |a n | 2 , 
which is clearly nonnegative. 

8 The positive definiteness must be relaxed in the space-time of relativity theory, in which nonzero vectors can have zero 
“length.” 



26 1. VECTORS AND TRANSFORMATIONS 


weight function of an 
inner product defined 
in terms of integrals 


natural inner product 
for complex 
functions 


orthogonality defined 
orthonormal basis 


Kronecker delta 


o Similarly, for \a ), \b) e R n the same definition (without the complex conjugation) 
satisfies all the properties of an inner product. 

• For \a) y |^> € €°° the natural inner product is defined as (a\b) = af A. The 

question of the convergence of this sum is the subject of Problem 1.16. 

• Let x(t), y(t) g the space of all polynomials in t with complex coefficients. 
Define 

(x\y) = I wO* ⑴ y(0 成 C 1 - 4 ) 

Ja 

where a and & are real numbers—~or infinity — for which the integral exists, and «;(/) 
is a real-valued, continuous function that is always strictly positive in the interval 
(a,b). Then Equation (1.4) defines an inner product. Depending on the so-called 
weight function w(0, there can be many different inner products defined on the 
infinite-dimensional space IP c [f]. 

• Let /, g e C(a, b) and define their inner product by 

(f\g) = / w(x)f*(x)g(x)dx. 

Ja 


It is easily shown that (/| g) satisfies all the requirements of the inner product if, 
as in the previous case, the weight fiinction w(x) is always positive in the interval 
(a, b). This is called the standard inner product on C(«, b). @ 


1.2.1 Orthogonality 

The vectors of analytic geometry and calculus are often expressed in terms of unit 
vectors along the axes, i.e” vectors that are of unit length and perpendicular to one 
another. Such vectors are also important in abstract inner product spaces. 

1.2A Definition. Vectors |a), |fo) e V are orthogonal if {a\b) = 0. A normal 
vector, or normalized vector, \e) is one for which {e\e) = l. A basis B = 
in an N-dimensional vector space V is an orthonormal basis if 


(eAej) = Stj = 


ifi ^ 


(1-5) 


where 5", defined by the last equality, is called the Kronecker delta. 

1.2.5. Example. Here are examples of orthonormal bases: 

• The standard basis of (or C rt ) 

ki) = (l,0 ,...,0 ), k 2 〉= (0 ， l ， /" ，0 )， …， WJ = (0,0, … . ，1 ) 

is orthonormal under the usual inner product of those spaces. 



1.2 INNER PRODUCT 27 





Figure 1.1 The essence of the Gram-Schmidt process is neatly illustrated by the process in two dimensions. 
This figure，depicts the stages of the construction of two orthonormal vectors. 


o Let \ek) = e lkx /\/l7t be functions uj C(0, 2n) with 切⑻ = ： 1. Then 
(ek\ek) = I e~ ikx e ikx dx = 1, 

2^ Jo 

and for l ^k, 

= ^— / e~ llx e ikx dx= — dx _ q 

^ JO Jq 

Thus ， (eilek) = 


1.2.2 The Gram-Schmidt Process 


The Gram-Schmidt 
process explained 


It is always possible to convert any basis in V into an orthonormal basis. A process 
by which this maybe accomplished is called Gram-Schmidt orthonormaliza- 
tion. Consider a basis B = {|ai), |« 2 > ， … ， 1^)}- We intend to take linear com¬ 
binations of \a{) in such a way that the resulting vectors are orthonormal. First, we 
let |ei) = |ai) /\/{a\\a\) and note that {e\\e\) = 1. If we subtract from [^ 2 ) its 
projection along |ei), we obtain a vector that is orthogonal to \e\) (see Figure 1.1). 
Calling the resulting vector \e f 2 ), we have \e f 2 ) = |« 2 > - {e\\a 2 ) \e\), which can 
be written more symmetrically as \e r 2 ) = |« 2 > — ki> (^i[fl 2 >- Clearly, this vector 


is orthogonal to \e\). In order to normalize \e r 2 ), we divide it by yj{e f 2 \ Then 
Wi) == W 2 ) /^(^l e 2 ^ will be a normal vector orthogonal to \e\). Subtracting from 


| 的 》 its projections along the first and second unit vectors obtained so far will give 
the vector 


2 

= te) - l^i) {e\\a^) - \€2) {e 2 |a3> = \a 3 ) - 〈咖 3 》， 


28 1- VECTORS AND TRANSFORMATIONS 



⑻ (b) ⑹ 


Figure 1.2 Once the orthonorraal vectors in the plane of two vectors are obtained, the third orthonormal vector 
is easily constructed. 


which is orthogonal to both \e\) and |^ 2 ) (see Figure 1.2): 

=1 =0 

(eil 4 > = (^i|fl 3 > - (^lki) {e\\a 2 ) - {e\\e 2 ) <^ 2 |« 3 > = 0 . 
Similarly, {^1 ^ 3 ) = 0. 


Erhard Schmidt (1876-1959) obtained his doctorate under 
the supervision of David Hilbert, His main interest was in in¬ 
tegral equations and Hilbert spaces. He is the “Schmidt” of 
the Gram-Schmidt orthogonalization process, which takes 
a basis of a space and constructs an orthonormal one from 
it. (Laplace had presented a special case of this process long 
before Gram or Schmidt.) 

In 1908 Schmidt worked on infinitely many equations in 
infinitely many unknowns, introducing various geometric no¬ 
tations and terms that are still in use for describing spaces of 
functions. Schmidt’s ideas were to lead to the geometry of 
Hilbert spaces. This was motivated by the study of integral equations (see Chapter 17) and 
an attempt at their abstraction. 

Earlier, Hilbert regarded a function as given by its Fourier coefficients. These satisfy 
the condition that is finite. He introduced sequences of real numbers {x rt } such 

thatE^l^ 2 is finite. Riesz and Fischer showed that there is a one-to-one correspondence 
between square-integrable functions and square-summable sequences of their Fourier co¬ 
efficients. In 1907 Schmidt and Frechet showed that a consistent theory could be obtained 
if the square-summable sequences were regarded as the coordinates of points in an infinite¬ 
dimensional space that is a generalization of n-dimensional Euclidean space. Thu^functions 
can be regarded as points of a space, now called a Hilbert space. 










1.2 INNER PRODUCT 29 


In general, if we have calculated m orthononnal vectors \e \),..., [^ m ), with 
m < N t then we can find the next one using the following relations: 

m 

l e ^+l) = ㈤ 知 +1 〉， 


l^m+1) 


4+1〉 


Y ^ e m+l I e， m+\^ 


( 1 . 6 ) 


Even though we have been discussing ftnite-dimensional vector spaces, the process 
of Equation (1.6) can continue for infinite-dimensions as well. The reader is asked 
to pay attention to the fact that, at each stage of the Gram—Schmidt process, one 
is taking linear combinations of the original vectors. 


1.2.3 The Schwarz Inequality 

Let us now consider an important inequality that is valid in both finite and infinite 
dimensions and whose restriction to two and three dimensions is equivalent to the 
fact that the cosine of the angle between two vectors is always less than one. 

1.2_6. Theorem, For any pair of vectors \a ), \b) in an inner product space V, the 
Schwarz inequality Schwarz inequality holds: {a\a) (b\b) > | {a\b} | 2 . Equality holds when \a) is 

proportional to \b). 

Proof. Let |c> = \b) - ({a\b} / {a\a}) \a), and note that {a\c) = 0. Write \b) 
i{ a \b) / {a\a)) \a) + |c) and take the inner product of \b) with itself: 

W b ) = (ala〉+ (c\c ). 

Since the last term is never negative, we have 

(b\b)> H ^ 2 ^ (a\a){b\b)>\{a\b) [ 2 . 

Equality holds iff (c| c) = 0 or |c) = 0. From the definition of |c), we conclude 
that \a) and \b) must be proportional. 口 

Notice the power of abstraction: We have derived the Schwarz inequality solely 
from the basic assumptions of inner product spaces independent of the specific 
nature of the inner product. Therefore, we do not have to prove the Schwarz 
inequality every time we encounter a new inner product space. 


Karl Herman Amandus Schwarz (1843—1921) the son of an architect, was bom in what 
is now Sobiecin, Poland. After gymnasium, Schwarz studied chemistry in Berlin for a time 



30 1. VECTORS AND TRANSFORMATIONS 


before switching to mathematics, receiving his doctorate in 1864. He was greatly influenced 
by the reigning mathematicians in Germany at the time, especially Kummer and Weierstrass. 
The lecture notes that Schwarz took while attending Weierstrass’s lectures on the integral 
calculus still exist. Schwarz received an initial appointment at Halle and later appointments 
in Zurich and Gottingen before being named as Weierstrass’s successor at Berlin in 1892. 
These later years, filled with students and lectures, were not Schwarz’s most productive, 
but his early papers assure his place in mathematics history. 

Schwarz's favorite tool was geometry, which he soon 
turned to the study of analysis. He conclusively proved some 
of Riemann's results that had been previously (and justifiably) 
challenged. The primary result in question was the assertion 
that every simply connected region in the plane could be con- 
foimally mapped onto a circular area. From this effort came 
several well-known results now associated with Schwarz’s 
name, including the principle of reflection and Schwarz’s 
lemma. He also worked on surfaces of minimal area, the 
branch of geometry beloved by all who dabble with soap bub¬ 
bles. 

Schwarz’s most important work, for the occasion of Weierstrass^ seventieth birthday, 
again dealt with minimal area, specifically whether a minimal surface yields a minimal area. 
Along the way, Schwarz demonstrated second variation in a multiple integral, constructed 
a function using successive approximation, and demonstrated the existence of a “least” 
eigenvalue for certain differential equations. This work also contained the most famous 
inequality in mathematics, which bears his name. 

Schwarz’s success obviously stemmed from a matching of his aptitude and training to 
the mathematical problems of the day ‘ One of his traits, however, could be viewed as either 
positive or negative — his habit of treating all problems, whether trivial or monumental, with 
the same level of attention to detail. This might also at least partly explain the decline in 
productivity in Schwarz’s later years. 

Schwarz had interests outside mathematics, although his marriage was a mathematical 
one, since he married Kummer’s daughter. Outside mathematics he was the captain of the 
local voluntary fire brigade, and he assisted the stationmaster at the local railway station by 
closing the doors of the trains! 



1.2.4 Length of a Vector 

In dealing with objects such as directed line segments in the plane or in space, the 
intuitive idea of the length of a vector is used to define the dot product. However, 
sometimes it is more convenient to introduce the inner product first and then define 
the length, as we shall do now. 

norm of a vector 1«2.7_ Definition. The norm, or length, of a vector \a) in an inner product space 
defined i s denoted by ||a|| and defined as ||a|| = y/{a\a). We use the notation \\aa + pb\\ 
for the norm of the vector a\a) + ^ \b). 

One can easily show that the norm has the following properties: 


1.2 INNER PRODUCT 31 


triangle inequality 
normed linear space 

natural distance in a 
normed linear space 


parallelogram law 


1. The norm of the zero vector is zero: ||0|| =0. 

2. [|a|| > 0, and \\a\\ = 0 if and only if |a) = |0). 

3. ||affl|| = \a\\\a\\ for any 9 complex a. 

4. \\a + b\\ < ||a|| + \\b\\. This property is called the triangle inequality. 

A vector space on which a norm is defined is called a normed linear space. 
One can introduce the idea of the “distance” between two vectors in a normed 
linear space. The distance between \a) and \b) — denoted by d(a, b) — is simply 
the norm of their difference: d(a, b) = \\a — b\\. It can be readily shown that this 
has all the properties one expects of the distance (or metric) function introduced 
in Chapter 0. However, one does not need a normed space to define distance. For 
example, as explained in Chapter 0, one can define the distance between two points 
on the surface of a sphere, but the addition of two points on a sphere — a necessary 
operation for vector space structure — is not defined. Thus the points on a sphere 
form a metric space, but not a vector space. 

Inner product spaces are automatically normed spaces, but the converse is not, 
in general, true: There are normed spaces, i.e” spaces satisfying properties 1-4 
above that cannot be promoted to inner product spaces. However, if the norm 
satisfies the parallelogram 里 aw ， 

\\a+bf^\\a^b\\ 2 = 2\\a\\ 2 + 2\\b\\\ (1.7) 

then one can define 

{a\b) = \{\\a + 6|| 2 - \\a - b\\ 2 - i (||a + ibf - \\a - ib\\ 2 )} (1.8) 

# 

and show that it is indeed an inner product. In fact, we have (see [Frie 82, pp. 
203-204] for a proof) the following theorem. 

1.2.8* Theorem. A normed linear space is an inner product space if and only if 
the norm satisfies the parallelogram law. 

Now consider any A^-dimensional vector space V. Choose a basis in 

V, and for any vector \a) whose components are {or r *}^ 1 in this basis, define 

N 

i=l 

The reader may check that this defines a norm, and that the norm satisfies the 
parallelogram law. From Theorem 1.2.8 we have the following: 

1.2.9. Theorem. Every finite-dimensional vector space can be turned into an inner 
product space. 


1. VECTORS AND TRANSFORMATIONS 


C n has many 
different distance 
functions 


1,2.10. Example. Let the space be C n . The natural inner product of € n gives rise to a 
norm ， which, for the vector \a) = (ot\,a 2 ,.. -, Q5«) is 

_ n 

\\a\\ = y/(a\a) = ^^1 叫 | 2 . 

+ =1 

This norm yields the following distance between \a) and \b) =( 芦 i ，於 2 ,… ， p n )' 
d(a, b) = \\a -b\\ = 

One can define other norms, such as ||« || x = J^ =l \a ( which has all the required properties 
of a norm, and leads to the distance 

n 

d\(a, b) = \\a -b\\\= ^ |a/ - ft l. 

i=l 

Another norm defined on C n is given by 




where p is a positive integer. It is proved in higher mathematical analysis that || -1|^ has all 
the properties of a norm. (The nontrivial part of the proof is to verify the triangle inequality.) 
The associated distance is 


d p (a 9 b) = \\a-b\\ p = 



The other two norms introduced above are special cases, for p = 2 and j? = 1. 


1.3 Linear Transformations 

We have made progress in enriching vector spaces with structures such as norms 
and inner products. However, this enrichment, although important, will be of little 
value if it is imprisoned in a single vector space. We would like to give vector 
space properties freedom of movement, so they can go from one space to another. 
The vehicle that carries these properties is a linear transformation which is the 
subject of this section. However, first it is instructive to review the concept of a 
mapping (discussed in Chapter 0) by considering some examples relevant to the 
present discussion. 

1.3*1. Example* The following are a few familiar examples of mappings. 

9 The first property follows from this by letting a = 0. 


1.3 LINEAR TRANSFORMATIONS 33 


linear transformation, 
linear 叩 erator, 
endomorphism 


1. Let / : R — M be given by f(x) = x 2 . 

2. Let g :R 2 Rbt given fay g(x, y) =x 2 + — 4. 

3. Let F : M 2 C be given by F(x t y) — U(x, -\-iV(x, y), where t/ : R 2 M 
and V : E 2 M. 

4. Let T : E — R 2 be given by T(t) = (t -\-3,2t — 5). 

5. Motion of a point particle in space can be considered as a mapping M : [a, b] M 3 , 

where [a, b] is an interval of the real line. For each t G [a, b], we define M(t)= 
(x(t), y(t), z(t)), where y“)，and z(t) are real-valued functions of t. If we 
identify t with time, which is assumed to have a value in the interval [a, b\, then 
M(t) describes the path of the particle as a function of time, and a and b are the 
beginning and the end of the motion, respectively ■ 圃 

Let us consider an arbitrary mapping F : V ^ W from a vector space V 
to another vector space W. It is assumed that the two vector spaces are over the 
same scalars, say C. Consider \a) and \b) in V and \x) and |y) in W such that 
尸 （ |0〉）= and F(\b)) = [y>. In general, F does not preserve the vector space 
structure. That is, the image of a linear combination of vectors is not the same as 
the linear combination of the images: 

F(a\a)-{-fi\b))^ aF(\x))+ PF(\y)), 

This is the case for all the mappings of Example 1.3.1 except the fourth item. 
There are many applications in which the preservation of the vector space structure 
(preservation of the linear combination) is desired. 

1.3.2* Definition. A linear transformation from the complex vector space V to 
the complex vector space W is a mapping T : V ^ W such that 

T(ot \a) + p \b)) = aT(|a 〉） + pT(\b)) V \a ), |fe) € V and a, p eC. 

A linear transformation T :V ^ V is called an endomorphism ofV ora linear 
operator on V. The action of a linear transformation on a vector is written without 
the parentheses: T(|fl)) = T \a). 

The same definition applies to real vector spaces. Note that the definition 
demands that both vector spaces have the same set of scalars: The same scalars 
multiply vectors in V on the LHS and those in W on the RHS. An immediate 
consequence of the definition is the following: 


1.3.3. Box. Two linear transformations T : V —• W and U : V — W are 
equal if and only if! \ai) = U \at) for all \ai) in some basis ofV. Thus, a 
linear transformation is uniquely determined by its action on some basis of 
its domain space. 





.VECTORS AND TRANSFORMATIONS 


The equality in this box is simply the set-theoretic equality of maps discussed 
in Chapter 0. An important example of a linear transformation occurs when the 
linear functional second vector space, W, happens to be the set of scalars, C or M, in which case 
the linear transformation is called a linear functional. 

^(V, W) is a vector The set of linear transformations from V to W is denoted by L (V, W), and this 
space set happens to be a vector space. The zero transformation, 0, is defined to take every 

vector in V to the zero vector of W. The sum of two linear transformations T and U 
is the linear transformation T+U, whose action on a vector \a) G Vis defined to be 
(T+U) |a) = T|a) -|-U \a). Similarly, define aT by (aT) \a) = of(T|a)) = aJ\a), 
The set of endomorphisms of V is denoted by £>(V) rather than <C(V ， V), 
dual vector space V* The set of linear functionals £ (V ， C) ― or ( (V ， M) if Vis a real vector space — 

is denoted by V* and is called the dual space of V. 

1.3.4. Example. The following are some examples of linear operators in various vector 
spaces. The proofs of linearity are simple in all cases and are left as exercises for the reader. 

1. Let {|ai), [fl2> ， . •. ， be an arbitrary finite set of vectors in "V, and {fi, ( 2 , •, 

f m } an arbitrary set of linear functionals on V. Let 

m 

A = ^ kit) h € ^(V) 
k=l 

be defined by A \x) = J2l=l f *(k» = ZX=1 f *(l x » \ a k)- Then A is a linear 
operator onV. 

2. Let jt be a permutation (shuffling) of the integers {1,2. n). If \x) = (i]\ t r) 2 , 

… , rjn) is a vector in C 1 ， we can write 


derivative operator 


integration operator 


integration is a linear 
functional on the 
space of continuous 
functions 


Ajr 1^} = (j?7r(l)i … ，如 ⑻） • 

Then is a linear operator. 

3. For any |^) e 1P C [?], wither (?) = write \y) = D |jc), where \y) is defined 

as kakt k ~ l . Then D is a linear operator, the derivative operator. 

4. For every \x) e !P c [f], with x(t) = a k tk 9 write \y) = S \x), where \y) e 

^[t] is defined as - Then S is a linear operator, the 

integration operator. 

5. Define the operator int : C°(cz, b) ^Rby 

int(/) = [ 

Ja 

Then int is a linear functional on the vector space C 0 (a, b). 

6. Let b) be the set of real-valued functions defined in the interval [a, b] whose 
first n derivatives exist and are continuous. For any |/) e G n (a f b) define |m)= 
G I/), with u(t) = g(t)f(t) and g(t) a fixed function in G n (a, b). Then G is linear. 
In particular, the operation of multiplying by t, whose operator is denoted by 下 ， is 



1.3 LINEAR TRANSFORMATIONS 35 


kernel of a linear 
transformation 


nullity 


rank of a linear 
transformation 


dimension theorem 


An immediate consequence of Definition 1.3.2 is that the image of the zero 
vector in Vis the zero vector in W. This is not true for a general mapping, but it is 
necessarily true for a linear mapping. As the zero vector of V is mapped onto the 
zero vector of W, other vectors of V may also be dragged along. In fact, we have 
the following theorem. 

1.3.5. Theorem. The set of vectors in V that are mapped onto the zero vector of 
W under the linear transformation T : V —>* W form a subspace of V called the 
kernel ， or null space, ofT and denoted by kerT. 

Proof. The proof is left as an exercise. □ 

The dimension of ker T is also called the nullity of V. 

The proof of the following is also left as an exercise. 

1.3.6. Theorem. The range T(V) of a linear transformation T : V — W h a 
subspace ofW. The dimension ofT(V) is called the rank ofT. 

1.3.7. Theorem, A linear transformation is 1-1 (injective) iff its kernel is zero. 

Proof. The “only if” part is trivial. For the “if” part, suppose T|«i> = T | 吨 >; then 
linearity ofT implies that T(|«i) — \a^)) — 0. Since kerT = 0, we must have 
1^1 > = l«2>* 口 

Suppose we start with a basis of kerT and add enough linearly independent 
vectors to it to get a basis for V. Without loss of generality, let us assume that the first 
n vectors in this basis form a basis of kerT. So let 5 = {|ai), 1^2) ， . • • ， |aiv>}bea 
basis for Vand^ = {\a \), \a 2 ) ， … ， 1 办 〉} be a basis for kerT. Here N = dim V 
andn = dim kerT. It is straightforward to show that {T \a n ^\) ， •. ■ ， T \gn}} is a 
basis for T(V). We therefore have the following result. 

1.3.8* Theorem. Let! : V -^Wbea linear transformation. Then 10 
dim V = dim kerT + dim T(V) 

This theorem is called the dimension theorem. One of its consequences is that 
an injective endomorphism is automatically suijective, and vice versa: 

1.3.9. Proposition, An endomorphism of a finite-dimensional vector space is bi- 
jective if it is either injective or surjective. 

The dimension theorem is obviously valid only for finite-dimensional vec¬ 
tor spaces. In particular, neither surjectivity nor injectivity implies bijectivity for 
infinite-dimensional vector spaces. 


10 Recall that !he dimension of a vector space depends on the scalars used in that space. Although we are dealing with two 
different vector spaces here, since they are both over the same set of scalars (complex or real), no confusion in the concept of 


36 1. VECTORS AND TRANSFORMATIONS 


isomorphism and 
automorphism 


1.3.10, Example. Let us try to find the kernel of T : E 4 E 3 given by 

T(^i ， X2, — ( 2 xi +X2+X3 - X 4 , x\-\-X2 + 2x^ + 2x4, ^1 ~ ^3 - 3x4). 

We must look for (;q ， : X 3 , JC 4 ) such that TOq ， 巧， X 3 , X 4 ) = (0,0, 0)，or 

2xi -i-X2-h X3 - X4 = 0 , 

x\ -\-X2 + 2^3 + 2^4 = 0 , 

文 1 一 尤 3 — 3^4 = 0 * 

The “solution” to these equations is = ^3 + 3 x 4 and X 2 = — 3 x 3 ~ 5 文 4 . Thus, to be in 
ker T ， a vector in ]R 4 must be of the form 

(x^ + 3x4, -3j ：3 - 5x4, X 3 ， M) = 叼 (1 ， —3, 1,0) + 又 4(3, -5, 0, 1 )， 

where and are arbitrary real numbers ； It follows that ker T consists of vectors that can 
be written as linear combinations of the two linearly independent vectors (1, —3,1, 0) and 
(3, —5,0,1). Therefore, dim kerT = 2. Theorem 1.3.8 then says that dimT(V) = 2; that 
is, the range of T is two-dimensional. This becomes clear when one notes that 

T(xi ， a ： 2, a ： 3 ， X4) = (2j：i + ^2 + ^3 — 文 4)(1 。， 1) + Ul + A + 2 又 3 +2^)(0, 1 ， 一 1 )， 

and therefore T(xi , X 2 ， ^ 3 , X 4 ), an arbitrary vector in the range of T, is a linear combination 
of only two linearly independent vectors, ( 1 , 0 , 1 ) and ( 0 , 1 ， 一 1 〉. 國 

In many cases，two vector spaces may “look” different, while in reality they 
are very much the same. For example, the set of complex numbers C is a two- 
dimensional vector space over the reals, as is R 2 . Although we call the vectors of 
these two spaces by different names, they have very similar properties. This notion 
of “similarity” is made precise in the following definition. 

1*3.11. Definition. A vector space V is said to be isomorphic to another vector 
space W if there exists a bijective linear mapping T : V —^ W. Then T is called an 
isomorphism. 11 A bijective linear map ofV onto itself is called an automorphism 
ofV. The set of automorphisms ofV is denoted by GL(V). 

For all practical purposes, two isomorphic vector spaces are different manifes¬ 
tations of the “same” vector space. In the example discussed above, the correspon¬ 
dence T : C — R 2 , with T(x + = (x 9 y), establishes an isomorphism between 

the two vector spaces. It should be emphasized that only as vector spaces are C 
and M 2 isomorphic. If we go beyond the vector space structures, the two sets are 
quite different. For example, C has a natural multiplication for its elements, but 
R 2 does not. The following theorem gives a working criterion for isomorphism. 


11 The word “isomorphism，” as we shall see，is used in conjunction with many algebraic structures. To distinguish them, 
qualifiers need to be used. In the present context, we speak of linear isomorphism. We shall use qualifiers when necessary. 
However, the context usually makes the meaning of isomorphism clear. 



1.3 LINEAR TRANSFORMATIONS 37 


only two 
/V-dimensional 
vector spaces 


1.3.12. Theorem. A linear surjective map T :V ^ W is an isomorphism if and 
only if its nullity is zero. 

Proof. The “only if” partis obvious. To prove the “if” part, assume that the nullity 
is zero. Then by Theorem 1.3.7, T is 1-1. Since it is already surjective, T must be 
bijective. □ 

1.3.13. Theorem. An isomorphism T : V W carries linearly independent sets 
of vectors onto linearly independent sets of vectors. 

Proof. Assume that {\ai)yp =l is a set of linearly independent vectors in V. To show 
that{T \at )}^ =1 is linearly independent in W, assume that there exist ai,a 2 , • •. ,(x m 
such that ^JLiCtiTlai) = |0). Then the linearity of T and Theorem 1.3.12 give 
t(E-Li a，i = |0), or J2T=i 1 卬〉 =|0>, and the linear independence of 
the \ai) implies that o?i = 0 for all i. Therefore ， {Tk)}^^ must be linearly 
independent. □ 

The following theorem shows that finite-dimensional vector spaces are severely 
limited in number: 

1.3.14. Theorem. Two finite-dimensional vector spaces are isomorphic if and only 
if they have the same dimension. 

Proof. Let By = {|^i>}|Li t>e a basis for V and Bw = {l^')}/Li a basis for W. 
Define the linear transformation T |a !〉 = \bi),i = 1 ， 2, •. ■ ， The rest of the 
proof involves showing that T is an isomorphism. We leave this as an exercise for 
the reader. □ 

A consequence of Theorem 1.3.14 is that all -dimensional vector spaces 
over E are isomorphic to and all complex AT-dimensional vector spaces are 
isomorphic to C^. So, for all practical purposes, we have only two A^dimensiona! 
vector spaces, R N and C^. 


1.3.1 More on Linear Functionals 

An example of isomorphism is that between a vector space and its dual space, 
which we discuss now. Consider an A^-dimensional vector space with a basis 
B = {|«i), |«2) ， ■ •. ， lay〉}. For any given set of N scalars, {a ： i, a 〗， .■ • , a ^}， 
define the linear functional f a by f a \at) = a/. When f a acts on any arbitrary vector 
\b) = pi \ai) mV, the result is 

) N N 

= l«i> = (1.9) 

i=l i=l 

This expression suggests that \b) can be represented as a column vector with entries 
. Pn and i a as a row vector with entries 叫 Then t a \b) is 



38 1, VECTORS AND TRANSFORMATIONS 


Every set of N 
scalars defines a 
linear functional. 


dual basis 


merely the matrix product 12 of the row vector (on the left) and the column vector 
(on the right). 

i a is uniquely determined by the set {ai ， a2, ■ • • ， In other words，corre¬ 
sponding to every set of N scalars there exists a unique linear functional. This 
leads us to a particular set of functionals, fi，f〗. fjv corresponding, respec¬ 

tively, to the sets of scalars {1, 0,0, • • • ， 0}，{0, 1， 0, • •. ， 0}，■ • • ， {0, 0,0, … ， 1}. 
This means that 


fl \a\) — 1 

and 

fl \aj) = 0 

for 


h l«2) = 1 

■ 

and 

h Wj) = 0 

for 

7^2, 

■ 

鬌 

f" 1 印〉 =1 

and 

Wj) = o 

for 



or that 

fi \aj) = Sij , (1.10) 

where (5" is the Kronecker delta. 

The functionals of Equation (1.10) form a basis of the dual space V*. To 
show this, consider an arbitrary g e V*, which is uniquely determined by its 
action on the vectors in a basis B — {|ai), |« 2 > ， •. • ， Let g \ai) = y/ e C. 
Then we claim that g = YliLi In fact, consider an arbitrary vector \a) in 
V with components (ai，o? 2 ,with respect to B. Then, on the one hand, 
g|^) = g(E^i a / a i9 \ a i) = On the other hand, 

N ' N N N N 

\ a J^ = 12 yi Yl a J Si J = J2 yi0ti> 

i—\ j=l i=l J=1 i=l 

Since the actions of g and ytU yield equal results for arbitrary |a>, we con¬ 
clude that g = YliLi y/f*, i-e-, M/Li span V' Thus, we have the following result. 

1.3.15. Theorem. If V is an N-dimensional vector space with a basis B = 
{|«l), \ai ),..., la^)}, then there is a corresponding unique basis B* = 
in V* with the property that 'i \aj) = 8”. 

By this theorem the dual space of an JV-dimensional vector space is also N- 
dimensional, and thus isomorphic to it. The basis B* is called the dual basis of 
B. A corollary to Theorem 1.3.15 is that to every vector in V there corresponds a 



12 Matrices will be taken up in Chapter 3. Here, we assume only a nodding familiarity with elementary matrix operations. 



1.3 LINEAR TRANSFORMATIONS 39 


annihilatorofavector 
and a subspace 


dual, or pull back, of 
a linear 
transformation 


duals and inner 
products 


unique linear functional in V*. This can be seen by noting that every vector \a) is 
uniquely determined by its components (a\ ， a 2 ,. ■. ， a at) in a basis B. The unique 
linear functional t a corresponding to |o>, also called the dual of |fl), is simply 
with ft € B*. 

1.3.16. Definition. An annihilator of\a) € V is a linear functional f eV* such 
thatf\a) = 0. Let YJ be a subspace ofV. The set of linear functionals in V* that 
annihilate all vectors in W is denoted by W°. 

The reader may check that W 0 is a subspace of V*. Moreover, if we extend 
a basis {\ai)} k i=l of W to a basis B = of V， then we can show that the 

functionals {f；}^ +1 , chosen from the basis B* = dual to B, span W°. It 

then follows that 

dim V = dimW + dimW 0 . (1*11) 

We shall have occasions to use annihilators later on when we discuss symplectic 
geometry. 

We have “dualed” a vector, a basis, and a complete vector space. The only 
object remaining is a linear transformation. 

13.17. Definition. Let T Ube a linear transformation. Define T* : It* — 
V* by 13 

[T*(g>] \a)=g(T\a)) V |a) € V, g € IT, 

T* is called the dual or pull back, ofT. 

One can readily verify that T* e C(U*, V*), i.e., that T* is a linear operator 
on U*. Some of the mapping properties of T* are tied to those of T. To see this 
we first consider the kernel of T*. Clearly, g is in the kernel of T* if and only if g 
annihilates all vectors of the form T|a), i.e” all vectors inT(V). It follows that g is 
in T(V)°. In particular, if T is surjective, T(V) = U, and g annihilates all vectors in 
IX, i.e., it is the zero linear functional. We conclude that kerT* = 0, and therefore, 
T* is injective. Similarly, one can show that if T is injective, then T* is suijective. 
We summarize the discussion above: 

1.3.18. Proposition* Let T be a linear transformation and T* its pull back. Then 
kerT* = T(V)°. IfT is surjective (injective), then T* is injective (surjective). In 
particular T* is an isomorphism ifT is. 

It is useful to make a connection between the inner product and linear func¬ 
tionals. To do this, consider a basis {|ai>, |« 2 >，…， |a^>} and let a/ = {a|a/>. As 
noted earlier, the set of scalars defines a unique linear functional f a such 

that fa \ai) = a,-. Since (a\aj) is also equal to it is natural to identify i a with 


13 Do not confuse this “*” with complex conjugation. 


VECTORS AND TRANSFORMATIONS 


the symbol (a|，and write 1 (a\ where T is the identification map. 

It is also convenient to introduce the notation 14 

(la)) 1 = (a \, (1.12) 

where the symbol t means “dual, or dagger of,” Now we ask: How does this dagger 
dagger of a linear operation act on a linear combination of vectors? Let |c) =a\a) p \b) and take 

combination of the inner product of |c) with an arbitrary vector |jc) using linearity in the second 

factor: (a ：| c) = a {x\a) ^ (x| b). Now complex conjugate both sides and use 

the (sesqui)symmetry of the inner product: 

(LHSr = (x\cr = {c\x), 

(RHSf = a* {xlaf P* (x\b)* a* {a\x) ^ (b\x) 

= («* («|+r (b\)\x). 

.Since this is true for all |x), we must have (|c))^ = {c\ = a* {a\+fi* {b\. Therefore, 
in a duality “operation” the complex scalars must be conjugated. So, we have 

(«|a>+^ \b))^ = a* {fll-l-r (b\. (1.13) 

Thus, unlike the association |a> o which is linear, the association i a {a\ is 
not linear, but sesquilinear, i.e” the identification map T mentioned above is also 
sesquilinear: 

T(ai a + fi1 b ) = a* (a\ + ^ (b\ = a*T(f a ) + fT ⑹; 

It is convenient to represent \a) € C rt as a column vector 
^ot\ 

|a》=. 

A 

\«n 

Then the definition of the complex inner product suggests that the dual of |a> must 

Compare (1.14) with be represented as a row vector with complex conjugate entries: 
the comments after 

(19) - The complex (a\ = (a^ ... a*), (1.14) 

conjugation in (1.14) “ 

is the result of the and the inner product can be written as the (matrix) product 
sesquilinearity of the 

association /^\ 

\a) ^ { 5 |. 戶 2 n 

■ i=l 

14 The significance of this notation will become clear in Section 2.3. 




1.4 ALGEBRAS 41 


algebra defined 


dimension of the 
algebra; associativity; 
commutativi^; 
identity; and right 
and left inverses 


1_3,19_ Example. Let U and V be vector spaces with bases B\j = and By = 

respectively. Consider an mw-dimensional vector space W whose basis B\y 
is in one-to-one correspondence with the pairs (|m/) , and let \uiVj) be the vector 
corresponding to (|w r *), \vj}). For |«) € U with components in B\j and \v) e V 

with components in By, define the vector |m, u) e W whose components in 

are One can easily show that if \u) t \u f ), and \u ft ) are vectors in U and \u ff )= 

oi\u) ^ \u f ), then 

|m 〃, v) =a\u 7 v)-\- p ]u\ v) 

The space W thus defined is called the tensor product of U and V and denoted by U 0 V. 

One can also define tensor product of three or more vector spaces. Of special interest 
are tensor products of a vector space and its dual. The tensor product space V r?i y of type 
(r, s) is defined as follows: 

= V 0 VI. V ® V* (g ) 俨 ® ® V* 

» V - - - J s - y - 1 

r times s times 

We shall come back to this space in Chapter 25. _ 


1.4 Algebras 

In many physical applications, a vector space has a natural “product，” the prime 
example being the vector space of matrices. It is therefore useful to consider vector 
spaces for wWch such a product exists. 

1.4 丄 Definition. An algebra A over C (or R) is a vector space over C (or R), 
together with a binary operation fi : V x V —> V, called multiplication, that 
satisfies^ 5 

a(j8b + yc) = ^ab + yac Va ， b，c e 凡 V^, y g C(orM), 

with a similar relation for multiplication on the right The dimension of the vector 
space is called the dimension of the algebra. The algebra is called associative if 
the product satisfies a(bc) = (ab)c and commutative if it satisfies ab = ba. An 
algebra with identity is an algebra that has an element 1 satisfying al = la = a. 
An element b of an algebra with identity is said to be a left inverse o/a ifbo. = 1. 
Right inverse is defined similarly. 

1.4.2. Example* Define the following product on E 2 : 

(xi ， x 2 )(y\,y2) = (xiyi - +^2yi)* 

The reader is urged to verify that this product turns M 2 into a commutative algebra. 


15 We shall abandon the Dirac bra-and-ket notation in this section due to its clumsiness; instead we use boldface roman letters 
to denote vectors. It is customary to write ab for jit(a ， b). 


42 1. VECTORS AND TRANSFORMATIONS 


derivation of an 
algebra defined 


Similarly, the vector (cross) product on M? turns it into a nonassociative, noncommu- 
tative algebra. 

The paradigm of all algebras is the matrix algebra whose binary operation is ordinary 
multiplication of n x « matrices. This algebra is associative but not commutative. 

All the examples above are finite-dimensional algebras. An example of an infinite- 
dimensional algebra is G°°(a t b), the vector space of infinitely differentiable real-valued 
functions on a real interval (a, b). The multiplication is defined pointwise: If f b) 

and g e e°°(a 3 b), then 

fg(x) = f(x)g(x) Y xe (a,b). 

This algebra is commutative and associative. q 

The last item in the example above has a feature that turns out to be of great 
significance in all algebras, the product rule for differentiation. 

1.4.3. Definition. A vector space ertdomorphismD : A A is called aderivation 

on A if it has the additional property 

D(ab) — [D(a)]b + a[D(b)]. 

\AA. Example. Let A be the set of n xn matrices. Define the binary operation, denoted 
by as 

A • B = AB 一 BA, 

where the RHS is ordinary matrix multiplication. The reader may check that A together 
with, this operation becomes an algebra. Now let A be a fixed matrix, and define the linear 
transformation 

Da(B) = A • B. 

Then we note that 

D A (B o C) = A • (B • C) = A(B • C) - (B • C)A 

=A(BC — CB) - (BC — CB)A = ABC- ACB — BCA + CBA. 

On the other hand, 

(DaB)«C + B»(D a C) 

=(A o B) o C + B • (A • C) 

=(AB - BA) • C + B • (AC — CA) 

=(AB- BA)C- C(AB — BA) + B(AC - CA) - (AC- CA)B 
=ABC + CBA — BCA — ACB. 

So, Da is a derivation on A. 齒 

The linear transformations connecting vector spaces can be modified slightly to 
accommodate the binary operation of multiplication of the corresponding algebras: 



1.4 ALGEBRAS 43 


1.4.5. Definition. Let A and !B be algebras, A linear transformation T :A-^ "B 

algebra is called an algebra homomorphism if T(ab) — T(a)T(b). A bijective algebra 

homomorphism and homomorphism is called an algebra isomorphism. 
isomorphism 

1.4.6. Example. Let A be E 3 , and 3 the set of 3 x 3 matrices of the form 

( 0 a\ — 句 

一 01 0 ^3 

02 ~ a 3 0 

Then the map defined by 



( 0 a\ -a 2 

—ai 0 «3 

0.2 -«3 0 

can be shown to be a linear isomorphism. Let the cross product be the binary operation 
on A, turning it into an algebra. For 汜 ， define the binary operation of Example 1.4.4. The 
reader may check that, with these operations, T is extended to an algebra isomorphism. M 



structure constants 
of an algebra 


Gi 
one can 


iven an i 
an write 


algebra A and a basis B = for the underlying vector space. 



c 




eC. 


(U5) 


The complex numbers the components of the vector e/e ; - in the basis B, are 
called the structure constants of ^L. These constants determine the product of any 
two vectors once they are expressed in terms of the basis vectors of B. Conversely, 


1.4.7. Box. Given any N-dimensional vector space V, one can turn it into 
an algebra by choosing a basis and a set of N 3 numbers {cfj} and defining 
the product of basis vectors by Equation (1.15). 


1.4.8. Example* Consider the vector space of n y. n matrices with its standard basis 
{e/y}^ where e" has a 1 at the i jih position and zero everywhere else. This means that 

(^ij)lk = W 片 ， and 


n 


- X^( e 0')wr(ew)rn = = 各 im 各 jfAn = ^jk(^il)mn 


or 




The structure constants are c^j n kl = 8 i m Sjk^in 
these constants. 


.Note that one needs a double index to label 

1 




.VECTORS AND TRANSFORMATIONS 


=e^ej = e/ for i = 2, 3,4, 

= ^2 € ij^ fori » 7 = 2 » 3 » 4 , 
k 


[€ijk is defined in Equation (3.19)]. The reader may verify that these relations turn Er 
algebra Of into an associative, but noncommutative, algebra, called the algebra of quaternions and 
quaternions denoted by M. In this context, ei is usually denoted by 1, and e 2 » and ^ by i, j f and k, 
respectively, and one writes q = x ^ iy jz -V kw for an element of H. It then becomes 
evident that II is a generalization of C. In analogy with C, x is called the real part of and 
(y, z, w) the pure part of q. Similarly, the conjugate of ^ is ^ — iy — jz — kw. ^ 

Algebras have a surprisingly rich structure and are used extensively in many 
branches of mathematics and physics. We shall see their usefulness in our dis¬ 
cussion of group theory in Part VII. To close this section, and to complete this 
introductory discussion of algebras, we cite a few more useful notions. 

left, right, and 1.4.10. Definition* Let Abe an algebra, A subspace B of A is called a subalgebra 

two-sided ideals of A if 怎 contains the products of all its members. If ^ has the extra property that 
it contains ab for all SLe A and be®, then ® is called a left ideal of A. A right 
ideal and a two-sided ideal are defined similarly. 

It is clear from the definition that an ideal is automatically a subalgebra, and 

■ that 

1.4.11, Box* No proper ideal of an algebra with identity can contain the 
identity element. 

In fact, no proper left (right) ideal can contain an element that has a left (right) 
minimal ideal inverse. An ideal can itself contain a proper (sub)ideal. Ifan ideal does not contain 
any proper subideal, it is called a minimal ideal. 

1«4«12. Example. The vector space G°(a, b) of all continuous real-valued functions on 
an interval (a, b) is turned into a commutative algebra by pointwise multiplication: If 
f，g € b) t then the product fg is defined by fg{x) = f{x)g(x) for all x € (a, b). 
The set of functions that vanish at a given fixed point c e (a, b) constitutes an ideal in 
b). Since the algebra is commutative, the ideal is two-sided. 臨 

One can easily construct left ideals for an algebra A: Take any element x e 
ideals generated by and consider the set 
an element of an 

algebra yix = {ax | a 6 ^l}. 


1.4.9. Example. In the standard basis {e；} of M 4 , choose the structure constants as fol¬ 
lows: 


2 e le/ e i 
ee 





1.5 PROBLEMS 45 


The reader may check that *Ax is a left ideal. A right ideal can be constructed 
similarly. To construct a two-sided ideal, consider the set 

AxA = {axb I a, b e ^l}. 

These are all called ideals generated by x. 

1.5 Problems 

1.1. Let R+ denote the set of positive real numbers. Define the “sum” of two 
elements of to be their usual product, and define scalar multiplication by 
elements of IR as being given by r ^ p — p r where r 6 M and p e With these 
operations, show that R" 1 " is a vector space over M. 

1.2. Show that the intersection of two subspaces is also a subspace. 

1.3. For each of the following of IR 3 determine whether it is a subspace 

of M 3 : 

(a) {(x, : y ， z) e R 3 I x + y - 2z = 0}; 

(b) {(x, y, z) €R 3 \x y - 2z = 3 )； 

⑹ {(w) e M 3 I xyz = 0}. 

1.4. Prove that the components of a vector in a given basis are unique. 

1.5. Show that the following vectors form a basis for C n (or E w ). 



1.6. Let W be a subspace of R 5 defined by 

W = {(jci.a ： 5) e 脫 5 I 义 l = 3 又 2 + 又 3, 又 2 = 又 5, and 无 4 = 2x^}. 

Find a basis for W. 

1.7. Show that the inner product of any vector with |0) is zero. 

1.8. Prove Theorem 1.1.5. 

1A Find ao, bo f bi t co f ci } and ci such that the polynomials ao, feo + and 
co + c\t + C 2 t 2 are mutually orthonormal in the interval [0,1]. The inner product 
is as defined for polynomials in Example 1.2.3 with w(0 = 1. 



.VECTORS AND TRANSFORMATIONS 


1.10. Given the linearly independent vectors x(t) = t n 9 for n = 0, 1 ， 2, _. • 
in 7 c [t], use the Gram-Schmidt process to find the orthonormal polynomials 
and e 2 (t) 

(a) when the inner product is defined as {x\y) = dt. 

(b) when the inner product is defined with a nontrivial weight function: 


(x\y) 


e~ t2 x*(t)y(t) dt. 


Hint: Use the following result: 

r e -^ dt =\f 


3.5 ".(n —1) 


if n = 0, 
if n is odd, 

if n is even. 


1.11. (a) Use the Gram-Schmidt process to find an orthonormal set of vectors 
out of (1,-1,1), (-1,0,1), and (2,-1,2). 

(b) Are these three vectors linearly independent? If not, find a zero linear combi¬ 
nation of them by using part (a). 

1.12. (a) Use the Gram-Schmidt process to find an orthonormal set of vectors 
out of (1， 一 1， 2) ，（ —2, 1， -1)， and (-1,-1,4). 

(b) Are these three vectors linearly independent? If not, find a zero linear combi¬ 
nation of them by using part (a). 

1.13. Show that 

[°° (t 10 -t 6 + 5t 4 - 5)e^ dt 


(t 4 - l) 2 e~ f4 dt 


(t 6 + 5) 2 e- f4 dt. 


1.14. Show that 

/ oo POO 

dx dy(x 5 - x 3 + 2 a ： 2 - 2)( / - y 3 + 2y 2 - 2)e^ (x ) 

■OO J—OO 

< f dx f dy {x A - lx 1 + 1)(/ + 4/ + 4)e~ (j：4+y4) . 

J—oo J—oo 

Hint: Define an appropriate inner product and use the Schwarz inequality. 

1.15. Show that for any set of n complex numbers « 2 ,..., a n , we have 

|o?l + Of2 + • •. + tfn | 2 ^ W (|0?i | 2 + [0?2| 2 + … + 1% | 2 ) • 


Hint: Apply the Schwarz inequality to (1,1,.1) and (ai, a 2 , a n ). 



1.5 PROBLEMS 47 


1*16* Using the Schwarz inequality show that if { 吣 }^^ and { 历}匕 are in C 00 , 
then a fA* is convergent. 

1.17. Show that T : M 2 R 3 given by T(x, y) = (x 2 + y 2 ,x + y, 2x — j) is not 
a linear mapping ‘ 

1.18. Verify that all the transformations of Example 1.3.4 are linear. 

1.19. Let it be the permutation that takes (1,2, 3) to (3, 1 ， 2). Find 

Ajt 1^') j / = 1,2, 3, 

where {|^*)}? =1 is the standard basis of R 3 (or C 3 )，and is as defined in Exam¬ 
ple 1.3.4. 

X.20, Show that if T e C(C, C), then there exists a e C such that T\a) = a \a) 
for all \a) e C. 

1.21, Show that if {|a ? )}^ =1 spans V and T € W)is suijective, then {T I«i)}" =1 

spans W. 

1.22. Give an example of a function / : R 2 M such that 
/(or |a» — af(\a}) V of € E and \a) e M 2 

but / is not linear. Hint: Consider a homogeneous function of degree 1. 

1*23. Show that the following transformations are linear: 

(a) V is C over the reals and C \z) = |z*>. Is C linear if instead of real numbers, 
complex numbers are used as scalars? 

(b) Vis T[t] mdT\x(t)) = \x(t-{-!)) - 

1.24. Verify that the kernel of a transformation T : V — W is a subspace of V, 
and thatT(V) is a subspace of W. 

1.25. Let V and W be finite dimensional vector spaces. Show that if T e (V ， W) 
is surjective, then 6imW < dim V. 

1.26. Suppose that V is finite dimensional and T s W) is not zero. Prove 
that there exists a subspace Uof V such that kerT n U = {0} and T(V) - = T(U). 

1.27. Show that W° is a subspace of V* and 
dimV = dimW + dimW 0 . 

1.28. Show that T and T* have the same rank. In particular, show that if T is 
injective, then T* is suijective. Hint: Use the dimension theorem for T and T* and 
Equation (1.11). 



48 1. VECTORS AND TRANSFORMATIONS 


1.29. Show that ⑻ the product on E 2 defined by 

(^i, x 2 )(yi, yi) = - x 2 yi, x\yi + x 2 y\) 

turns E 2 into an associative and commutative algebra, and (b) the cross product 
on R 3 turns it into a nonassociative, noncommutative algebra. 

1.30. Fix a vector a 6 R 3 and define the linear transformation D a : R 3 M 3 
by D a (b) = a x b. Show that D a is a derivation of R 3 with the cross product as 
multiplication. 

1.31. Show that the linear transformation of Example 1.4.6 is an isomorphism of 
the two algebras A and®. 

1.32. Write down all the structure constants for the algebra of quaternions. Show 
that this algebra is associative. 

1.33. Show that a quaternion is real iff it commutes with every quaternion and 
that it is pure iff its square is a nonpositive real number. 

1.34. Let p and g be two quaternions. Show that 
⑻ {pqT = q^p\ 

(b) q G Riff — q t and^ € M 3 iffg* =— 

(c) qq* — q*q is a nonnegative real number. 

135, Show that no proper left (right) ideal of an algebra with identity can contain 
an element that has a left (right) inverse. 

1J6. Let c/lbe an algebra, and x e A. Show that Ax is a left ideal, xAisd right 
ideal, and yixA is a two-sided ideal. 

Additional Reading 

1. Axler, S. Linear Algebra Done Right, Springer-Verlag, 1996. A small text 
packed with information. Lots of marginal notes and historical remarks. 

2. Greub, W. Linear Algebra, 4th ed., Springer-Verlag, 1975. Chapter V has a 
good discussion of algebras and their properties. 

3. Halmos, P. Finite Dimensional Vector Spaces, 2nd ed” Van Nostrand, 1958. 
Another great book by the master expositor. 


Operator Algebra 


operator equality 


Recall that a vector space in which one can multiply two vectors to obtain a third 
vector is called an algebra. In this chapter, we want to investigate the algebra of 
linear transformations. We have already established that the set of linear trans¬ 
formations L(V, W) from V to W is a vector space. Let us attempt to define a 
multiplication as well. The best candidate is the composition of linear transforma¬ 
tions. If T : T7 —^ U and S : TI —>■ W are linear operators, then the composition 
S o T : V — W is also a linear operator, as can easily be verified. 

This product, however, is not defined on a single vector space, but is such 
that it takes an element in L(V, U) and another element in a second vector space 
L(ll, W) to give an element in yet another vector space L(V, "W). An algebra 
requires a single vector space. We can accomplish this by letting V = 11 = W. 
Then the three spaces of linear transformations collapse to the single space L (V, V )， 
the set of endomorphisms of V, which we abbreviate as £/(V) and to which T, S, 
ST = S o T, and TS = T o S belong. The space ((V) is the algebra of the linear 
operators on V. 

2.1 Algebra of 4L(V) 

Operator algebra encompasses various operations on, and relations among, op¬ 
erators. One of these relations is the equality of operators, which is intuitively 
obvious; nevertheless, we make it explicit in (see also Box 1.3.3) the following 
definition. 

2.1.1. Definition. Two linear operators T，U € -C(V) are equal ifT \a) =U \a)for 
all \a) e V. 


Because of the linearity of T and U, we have 


50 2. OPERATOR ALGEBRA 


2.1.2. Box. Two endomorphisms T, U G C(V) are equal ifJ\ai) — U Ja*) 
for all \ai) G B, where B is a basis ofV. Therefore, an endomorphism is 
uniquely determined by its action on the vectors of a basis. 


The equality of operators can also be established by other, more convenient, 
methods when an inner product is defined on the vector space. The following two 
theorems contain the essence of these alternatives. 

2.1.3. Theorem. An endomorphism T of an inner product space is 0 if and only 
if 1 {b\J\a} = (b\Ta) = 0 for all \a) and \b). 

Proof. Clearly, if T = 0 then {b\ T \a) = 0. Conversely, if (&| T|a) = 0 for ali \a) 
and |fe), then, choosing j^> = T|a) = |Ta), we obtain 

(T^|Ta) = 0 V|fl ) 令 T |a) = 0 V \a ) 分 T = 0 

by positive definiteness of the inner product. □ 

2.1.4. Theorem. A linear operator T on an inner product space is 0 if and only if 
(aIT \a) = 0 for all |a>. 

Proof. Obviously, if T — 0, then 〈 a| T |a 〉 = ()• Conversely, choose a vector a |a)H- 
p \b), sandwich T between this vector and its dual, and rearrange terms to obtain 
polarization identity what is known as the polarization identity: 

(a|T|/7>+ (b\ T \a) = {aa + pb\ T \aa + fib) 

—|a| 2 (a|T|a 〉 一別 2 (ZHT 网 . 

According to the assumption of the theorem, the RHS is zero. Thus, if we let 
or = 卢 =1 we obtain (a\T\b) + {b\J\a) = 0. Similarly, with a = 1 and ^ = i 
we get i (a\T\b) - i (^|T|a) = 0. These two equations give (a\T\b) = 0 for all 
\a) , |^>. By Theorem 2.1.3, T =： 0. □ 

To show that two operators U and T are equal, one can either have them act 
on an arbitrary vector and show that they give the same result, or one verifies that 
U _ T is the zero operator by means of one of the theorems above. Equivalently, one 
shows that (a|T|^> = {a\ U \b) or {a\T\a) = {a\ U \a) for all \a ), \b). In addition 
to the zero element, which is present in all algebras, C(V) has an identity element, 
1, which satisfies the relation 1 |a> = \a) for all \a) e V. With 1 in our possession, 
we can ask whether it is possible to find an operator T— 1 with the property that 
T -1 T = TT _i = 1. Generally speaking, only bijective mappings have inverses. 
Therefore, only automorphisms of a vector space are invertible. 


x It is convenient here to use the notation |Ta) for T \a). This would then allow us to write the dual of the vector as {To|, 
emphasizing that it is indeed the bra associated with T |a) . 




2.1 ALGEBRAOFL(V) 51 


2.1.5. Example. Let the linear operator T : M 3 M 3 be defined by 

T(x\,X 2 ， X 3 ) = (x\ - s rX 2 ， X 2 ^x^ n x\ + 13 ). 

We want to see whether T is invertible and, if so, find its inverse. T has an inverse if and 
only if it is bijective. By the comments after Theorem 1.3.8 this is the case if and only if T 
is either surjective or injective. The latter is equivalent to kerT = |0). But ker T is the set 
of all vectors satisfying T(x\ , X 2 , x^) = (0,0, 0), or 

x\ -\~X 2 = 0, X 2 +X 2 = 0, X{ -\-x^ — 0. 

The reader may check that the unique solution to these equations is x\ ^ X 2 = = 0. 

Thus, the only vector belonging to kerT is the zero vector. Therefore, T has an inverse. 

To find T -1 apply T _1 T = 1 to (xi ， m ， ^ 3 )： 

(xi ， X2 ， x^) = T 一 1 T(XU 2 , 又 3) = T_1 (文 1 +X2,X2+X2,,xi +^ 3 ). 

This equation demonstrates how T 一 1 acts on vectors. To make this more apparent, we let 
x\-\-X 2 = x, X 2 -\-x^ = y, x\ = z 9 solve for ， X 2 , and ^3 in terms of x, y, and z ， 
and substitute in the preceding equation to obtain 

T 一 Huz) == - y + z, x + y - z 9 -x y z). 

Rewriting this equation in terms of x\,X 2 f and gives 

1^( X 1， X 2， x 3 ) = j(xi 一文 2 + A ， 文 1 +^2-^3. 一叉 1 + 又 2 +%), 

We can easily verify that T 一 1 T = 1 and that TT — 1 = 1 . 0 

The following theorem, whose proof is left as an exercise, describes some 
properties of the inverse operator. 

2*1.6. Theorem. The inverse of a linear operator is unique. Ifl" and S are two 
invertible linear operators, then IS is also invertible, and 

(TS)" 1 =S- 1 T~ 1 . 

An endomorphism T \ V ^ V is invertible if and only if it carries a basis of V 
onto another basis ofV. 

2.1.1 Polynomials of Operators 

With products and sums of operators defined, we can construct polynomials of 
operators. We define powers of T inductively as T m = TT m ~ x == T W-1 T for all 
positive integers m > 1. The consistency of this equation (for m = \) demands 
that T° = 1 ■ It follows that polynomials such as 

p(T)= 帥 1 + o?iT + af2T 2 + • •. + a n J n 

can be defined. 


ODTt) EOTfjpHANESf [ 

E ■ 丁 • U LIBRARY 


52 2. OPERATOR ALGEBRA 


2 丄 7. Example* Let T 0 : M 2 4 M 2 be the linear operator that rotates vectors in the 
A ： y-plane through the angle 0, that is, 

3 ?) = (xcos 0 — y sin 0 ， ;c sin 0 + y cos 0 ). 

We are interested in powers of T^: 

x f y 

r \ f . 人. - 1 . 丨 ** ， ，人、 

T^(x, y) = 1 q(x cosO — y sin 0 ， jc sin 0 + y cos 0 ) 

= (x f cos 6 — y f sin 0 , sin 0 + y f cos 0) 

=((jc cos 0 — y sin 6 ) cos 0 — {x sin 0 + y cos 0) sin^, 

(: c cos 0 —y sin^) sin^ + (xsin 0 + y cos 0 )cos 0 ) 

= (x cos 20 — y sin 20 , a: sin 20 y cos 20). 

Thus, T 2 rotates (x t y) by 20. Similarly, one can show that 

T^(jc, y) = (x cos 30 — y sin30, ^ sin30 + y cos3^), 

and in general, = (xcos nO — y sin sinn^ + ycos n0) 9 which shows that Tg 

is a rotation of (x, y) through the angle n9, that is, = T n e ，This result could have been 
guessed because is equivalent to rotating n times, each time by an angle 6. ■ 

Negative powers of an invertible linear operator T are defined by J~ m = 
(T—i) m . The exponents of T satisfy the usual rules. In particular, for any two 
integers m and n (positive or negative), T m T n = T m+n and (T w ) n = V m K The 
first relation implies that the inverse of T m is T -m . One can further generalize the 
exponent to include fractions and ultimately all real numbers; but we need to wait 
until Chapter 4, in which we discuss the spectral decomposition theorem, 

2.1.8. Example. Let us evaluate T; n for the operator of the previous example. First, let 

us find Tf 1 (see Figure 2.1). We are looking for an operator such that y) = (x, y) 7 

or 

T^^xcos^ — ysinG.x sin^ + 3?cos^) = y). (2.1) 

We define x f = x cos 0 — y sin 0 and >/ = jc sin 0 十 ;y cos 0 and solve x and y in terms of 
x f and y to obtain x = x f cos 0 +;/ sin 沒 and y = —x f sin 0 + y f cos Substituting for x 
and y in Equation (2.1) yields 

Tf 1 W) = (jc^ cos 0 + /sin 0 ， —x ! sin^ + ;y’cos 0 ). 

Comparing this with the action of "Xq in the previous example, we discover that the only 

一 1 

difference between the two operators is the sign of the sinO term. We conclude that T^" 
has the same effect as T_@. So we have 

T - 1 = T _ 0 and T^ n = (T^ l ) n = (J- 9 ) n = T-nO- 


2.1 ALGEBRA OF L(V) 53 


Figure 2.1 



It is instructive to verify that T^" n Tg = 1: 

x f y { 

(x t y) = T^ n (x cos nO — y sinn0, sin nO + y cos nO) 

= {x f cos n6 + y f smnO, —x f sin nO + y r cos nO) 

=((x cos nO — y sin«0) cos nO + (x sin nO + y cos n6) sin nO, 

— (a: cos nO — y sin n9) sm nO + (x sin y cos nO) cos nO) 

=(x(cos 2 nO + sin 2 nO), >?(siii 2 nO + cos 2 nO)) = (x, y). 

Similarly, we can show that T^Tg n (x, y) = (x, y). 豳 

One has to keep in mind that p(T) is not, in general, invertible, even if T is. In 
fact, the sum of two invertible operators is not necessarily invertible. For example, 
although T and — T are invertible, their sum, the zero operator, is not. 


2.1.2 Functions of Operators 

We can go one step beyond polynomials of operators and, via Taylor expansion, 
define functions of them. Consider an ordinary function / (jc), which has the Taylor 
expansion 


k=0 x-XQ 

in which xo is a point where f(x) and all its derivatives are defined. To this function, 
there corresponds a function of the operator T, defined as 


00 


/(T) 


h dxk 


X=XQ 


(T-^pl) 
k\ 


k 


(2.2) 


54 2. OPERATOR ALGEBRA 


Because this series is an infinite sum of operators, difficulties may arise concerning 
its convergence. However, as will be shown in Chapter 4, /(T) is always defined 
for finite-dimensional vector spaces. In fact, it is always a polynomial in T. For 
the time being, we shall think of /"(T) as a formal infinite series. A simplification 
results when the function can be expanded about a ： — 0. In this case we obtain 


OO 


k=o ax 


x=0 


T k 

Id 


(2.3) 


A widely used function is the exponential, whose expansion is easily found to be 

(2.4) 


oo jk 

eT = ex pco = 


k=0 


2.1.9. Example. Let us evaluate exp(aT) when T : M 2 M 2 is given by 


t(dO = 

We can find a general fomiula for the action of T n on (x, y). Start with « = 2: 

T 2 (x,y) =T(-y,x) = (~x t -y) = -(x y y) = 

Thus, T 2 = —1. From T and T 2 we can easily obtain higher powers of T. For example: 


T 3 = T(T 2 ) 

=—T, T 4 = 

=T 2 T 2 = 

: 1 ， and in general, 

t 2w = 

(-1)^1 

for n = 

= 0,1 ， 2, • •. 

t 2«+1 = 

(-D^T 

for n = 

= 0, 1 ， 2, … 


Thus, 

, T ,_ («T) W ^ ( aT )« _ ^ ( tf T 产十 1 f (g T )2k 

exp(aT) — 2_^ , ^ n \ - nk + 1)' + ^ (2kV 

n odd neven n ' 十人 k=0 、 )■ 

f, «2fe+l T 2fe+l oo^ a 2kj2k oo^ (一 M 
= h ( 料 1 )! = 

f, (-i)V 蚪 1 
=1 h ( 及 + 1 )! 

The two series are recognized as si 


«! ㈡ 供 +D! ㈡ ⑽！ 

^ a 2k J 2k f, (^l) k a 2k ^ ^ (-l)k' 

,^ (-I) k <y 2k 


which shows that e aT is a polynomial (of first degree) in T. 

The action of e aT on (x, y) is given by 

e aJ (x 9 y) = (sinaT + cosal) (x, y) = sinaT(x t y) + cosa1(^, y) 
=(sina)(—y，x) + (co& a)(x 9 y) 

=(—)? sin of, sin a) + (x cos qj ，y cos a) 

=(x cos a — ysina f xsina -j- ycosa). 


2.1 ALGEBRA QFL(V) 55 


The reader will recognize the final expression as a rotation in the ^}?-plane through an angle 
cc. Thus, we can think of e aJ as a rotation operator of angle a about the z-axis. In this 
context T is called the generator of the rotation. 國 


2.1.3 Commutators 


The result of multiplication of two operators depends on the order in which the 
operators appear. This means that if T, U e ^(V),thenTU € L(V) and UT € C(V); 
however, in general UT # TU. When this is the case, we say that U and T do 
not commute. The extent to which two operators fail to Commute is given in the 
following definition. 

commutator defined 2.1,10. Definition. The commutator [U, T] of the two operators U and T in XL(V) 

is another operator in L (V ) ， defined as 

[U, T] = UT - TU. 

An immediate consequence of this definition is the following: 

2.1.11. Proposition. For S, T, U € Ji(V) and (or 3RJ, we have 


[U,T] = -[T,U], 

Mj』n = c^[u ， n ， 

[S，T + U] = [S ， T] + [S, U ]， 

[S + T, U] = [S ， U] + [T ， U ]， 

[ST, U] = S[T, U] + [S, U]T, 

[S, TU] = [S, T]U + T[S, U], 

[[S, T], U] + [[U, S], T] + [[T, U], S] = 0. 


antisymmetry 

linearity 

linearity in the right entry 
linearity in the left entry 
right derivation property 
left derivation property 
Jacobi identity 


Proof. In almost all cases the proof follows immediately from the definition. The 
only minor exceptions are the derivation properties. We prove the left derivation 
property: 

[S, TU] = S(TU 卜 (TU)S = STU - TUS + TSU - TSU 

V - V - ' 

=0 

=(ST _ TS)U + T(SU - US) = [S, T]U + T[S, U]. 

The right derivation property is proved in exactly the same way. □ 

A useful consequence of the definition and Proposition 2.1.11 is 
[A, A m ] = 0 for m = 0, 土 1 ，土 2, •... 


In particular, [A, 1] = 0 and [A, A"^ 1 ] = 0. 


. OPERATOR ALGEBRA 


2.2 Derivatives of Functions of Operators 

Up to this point we have been discussing the algebraic properties of operators, 
static objects that obey certain algebraic rules and fulfill the static needs of some 
applications. However, physical quantities are dynamic, and if we want operators 
to represent physical quantities, we must allow them to change with time. This 
dynamism is best illustrated in quantum mechanics, where physical observables 
are represented by operators. 

Let us consider a mapping H : M ^ /C(V), which 2 takes in a real number and 
gives out a linear operator on the vector space V. We denote the image of ^ € M by 
H(0, which acts on the underlying vector space V. The physical meaning of this is 
that as t (usually time) varies, its image H(^) also varies. Therefore, for different 
a time-dependent values of t, we have different operators. In particular, [H(0, H(〆)]0 for / # 
operator does not A concrete example is an operator that is a linear combination of the operators D 
commute with itself m( ^ j introduced in Example 1.3,4, with time-dependent scalars. To be specific, 
a I eren imes ^ H(0 = D cos cot+T sin a)t, where 出 is a constant. As time passes ， H(t) changes 
its identity from D to T and back to D. Most of the time it has a hybrid identity! 
Since D and T do not commute, values of H(t) for different times do not necessarily 
commute. 

Of particular interest are operators that can be written as exp H(0, where H(t) 
is a “simple” operator; i.e” the dependence of H(^) on t is simpler than the corre¬ 
sponding dependence of exp H(t). We have already encountered such a situation 
in Example 2.1.9, where it was shown that the operation of rotation around the 
' z-axis could be written as expaT, and the action of T on (x, y) was a great deal 

simpler than the corresponding action of exp aT. 

Such a state of affairs is very common in physics. In fact, it can be shown 
that many operators of physical interest can be written as a product of simpler 
operators, each being of the form exp aT. For example, we know from Euler’s 
theorem in mechanics that an arbitrary rotation in three dimensions can be written 
as a product of three simpler rotations, each being a rotation through a so-called 
Euler angle about an axis. 

derivative of an 2.2.1. Definition. For the mapping H : E £(V), we define the derivative as 

operator ¥ , 

dH ,■ H(/ + At) - H(0 

—— =lim - . 

dt At 

This derivative also belongs to L(V). 

As long as we keep track of the order, practically all the rules of differentiation 
apply to operators. For example, 

d dU dl 

石 (UT) = I T+U i 


2 Strictly speaking, the domain of H must be an interval [a, b] of the real line，because H may not be defined for all M. However, 
for our purposes, such a fine distinction is not necessary. 


2.2 DERIVATIVES OF FUNCTIONS OF OPERATORS 


We are not allowed to change the order of multiplication on the RHS, not even 
when both operators being multiplied are the same on the LHS. For instance, if 
we let U = T = H in the preceding equation, we obtain 

d / , dH dH 
—(H 2 ) = —H + H— 
dt dt dt 


This is not, in general, equal to 2H#. 


2.2.2. Example. Let us find the derivative of exp(rH), where H is independent of t. Using 
Definition 2.2.1, we have 

d exp[(? + At)H] - exp(^H) 

—exp(/H) = lim -----—— • 

dt 0 At 

However, for infinitesimal At we have 

exp[(f + At)H] — exp(fH) = e tH e AtH — e tH 

=/ H (1 + HAt) - e tH = e m HAt. 

Therefore, 

d e m HAt tH 

— exp(rH) = lim - = e m H. 

dt At^-0 At 

Since H and e tH commute , 3 we also have 

- 7 - exp(/H) = He tH . 
dt 

Note that in deriving the equation for the derivative of we have used the relation 

e tH e AtH = This may seem trivial, but it will be shown later that in general, 

e S+T # e s e T 篇 

Now let us evaluate the derivative of a more general time-dependent operator, 
exp[H ⑴]: 


— exp[H(r)] = lim 
dt Af—>-0 


exp[H(( + AO] — exp[H(r)] 


If H(t) possesses a derivative, we have, to the first order in At, 


H(t + AO = H ⑴ + Ar—H, 


and we can write exp[H(? -f- △£)] = exp[H(0 + AtdH/dt]. It is very tempting to 
factor out the exp[H(0] and expand the remaining part. However, as we will see 
presently, this is not possible in general. As preparation, consider the following 
example, which concerns the integration of an operator. 


58 2. OPERATOR ALGEBRA 


2«2.3. Example. The Schrodinger equation i \i/r(t)) — H can be turned into an 
evolution operator operator differential equation as follows. Define the so-called evolution operator U(/) by 

\ilr(t)) = U(0 It^( 0)), and substitute in the Schrodinger equation to obtain 

^U(OI^(0))=HU(OI^(0)>. 

Ignoring the arbitrary vector | 少 (0)> results in a differential equation inU(0. For the purposes 
of this example, let us consider an operator differential equation of the form d\}/dt — HU(0, 
where H is not dependent on t. We can find a solution to such an equation by repeated 
differentiation followed by Taylor series expansion. Thus, 

d 2 U d 9 

= H—U = H[HU(0] = H 2 U(0, 
dt 2 dt 

^ = 丢 [h 2 u(0] = h 2 ^-u = h 3 u(/). 

dt 5 dt dt 

In general d n U/dt n = H n U ⑴. Assuming that U(/) is well-defined at t = 0, the above 
relations say that all derivatives of U(f) are also well-defined at ( = 0. Therefore, we can 
expand U(r) aroundf = Oto obtain v 



Let us see under what conditions we have exp(S + T) = exp(S) exp(T). We 
consider only the case where the commutator of the two operators commutes 
with both of them: [T, [S, T]] = 0 = [S, [S,T]]. Now consider the operator \}(t )= 
一 f ( S+T ) and differentiate it using the result of Example 2.2.2 and the product 
rule for differentiation: 

~U = sWf ( S+T) + e ts Je tT e^ Q ^ - W(S + ( S+T ) 

dt 

= S/V T e— f(s+T> - e f V T Se- f(S4T) . (2.5) 


The three factors of U(t) are present in all terms; however, they are not always 
next to one another. We can switch the operators if we introduce a commutator. 
For instance, e tT S = Se tr + [e tT t S]. 

It is left as a problem for the reader to show that if [S, T] commutes with S and T ， 
then [e tT , S] = —/[S, T]e tJ , and therefore, e tr S = Se tT — ?[S, T]e tJ . Substituting 
this in Equation (2.5) and noting that e ts S = Se ts yields dU/dt = ^[S, T]U(0. 
The solution to this equation is 


，玄 2 

U(0 = exp ( y[S,T] 






a 

2 [S ^ ] 


3 This is a consequence of a more general result that if two operators commute, any pair of functions of those operators also 
commute (see Problem 2.14). 



• 2 DERIVATIVES OF FUNCTIONS OF OPERATORS 


because U(O) = 1. We thus have the following: 

2.2.4. Proposition. Let S,T e £(V).//[S, [S,T]] = 0 =： [T, [S, T]], then the 
Baker-Gampbeil- Baker-Campbell—Hausdorffformula holds: 

Hausdorff 

formula 厂 (f 2 /2)[s,n = ^(s+t) ( 26 ) 


In particular, e ts e tT = e f ( s + T ) if and only if[S, T] = 0. 
If t = 1 ， Equation (2.6) reduces to 

e S e T^(l/2)[SJ] = e S+T. 


(2.7) 


Now assume that both H ⑴ and its derivative commute with [H, dH/dt], Letting 
S = H ⑴ and T = AtdH/dt in (2.7)，we obtain 

= e H{tnMd\\jdt 

= e m e ^t{dHjdt) e -\H{t)^tdH/dty2 
For infinitesimal At, this yields 


Mt+At) 




and we have 


~ —外 


We can also write 

^H(r+A0 = e [W{t)+MdHjdt] _ e [A^H/^f+H(0] 


which yields 


字 [ 竽 ] • 

dt 2 L U 


Adding the above two expressions and dividing by 2 yields the following symmetric 
expression for the derivative: 

d 1 /dH u U ^H\ 1 idH u i 


2\dt 


~ fHC ， 斗 


anticommutator where {S, T} = ST + TS is called the anticommutator of the operators S and T. 
We, therefore, have the following proposition. 


60 2. OPERATOR ALGEBRA 


2.2.5. Proposition. H : R £ ； (V) and assume that H and its derivative 
commute with [H, dH/dt], Then 


d_ 

dt 


e 


H(0 




In particular, if[H, dH/dt] — 0, then 

d\\ » „dH 
dt 




A frequently encountered operator is F(0 = e tA Be _ tA ，where A and B 
卜 independent. It is straightforward to show that 

dF 

巧 ― L w" — df TA- F^l = I A. 

Using these results, we can write 


^ = F(0] = [A, [A, F(0]] ^ A 2 [F(0], 

dr dt 

and in general, d n F/dt n = A n [F(0], where A R [F(r)] is defined inductively as 
A n [F(0] = [A, A^^FCr)]], with A°[F(01 = F(f). For example, 

A 3 [F(01 = [A, A 2 [F(0]] = [A, [A, A[F(f)]]] = [A, [A, [A, F(0]]]_ 

Evaluating F ⑴ and all its derivatives att = 0 and substituting in the Taylor 
expansion about ，= 0, we get 


F(0 = 

^ t n d n F 
dtn 

That is, 

4 A 

oo f n 


oo t n oo t n 


/ =0 ^ n \ 


C ； DC — / ^ — ― I ^ L "， ， J 

«=o tlm 

Sometimes this is written symbolically as 
e tA Be~ tA e tA [Bl 


t 2 

2! 


(S^ 


where the RHS is merely an abbreviation of the infinite sum in the middle. 
For ^ = 1 we obtain a widely used formula: 


e A Be~ A = e\B] 




nl A n )[B]=B + [A, B] + - [A, [A, B]] + 



2.3 CONJUGATION OF OPERATORS 61 


If A commutes with [A ， B]，then the infinite series truncates at the second term ， 
and we have 

e tA Be^ tA = B + t[A, B]. 

For instance, if A and B are replaced by D and T of Example 1.3.4, we get (see 
Problem 2.3) 

e to Je -tD = 丁 + r[D，Tl = T + n. 


generator of 
translation 


momentum as 
generator of 
translation 


The RHS shows that the operator T has been translated by an amount t (more 
precisely, by t times the unit operator). We therefore call exp(/D) the translation 
operator of T by r, and we call D the generator of translation. With a little mod¬ 
ification T and D become, respectively, the position and momentum operators in 
quantum mechanics. Thus, 


2.2.6, Box. Momentum is the generator of translation in quantum mechan¬ 
ics. 


But more of this later! 


2.3 Conjugation of Operators 

We have discussed the notion of the dual of a vector in conjunction with inner 
products. We now incorporate linear operators into this notion. Let \b ), |c) e V 
and assume that |c) = T \b). We know that there are linear functionals in the dual 
space V* that are associated with (\b))^ = {b\ and (\c))^ = (c\. Is there a linear 
operator belonging to C(V*) that somehow corresponds to T? In other words, can 
we find a linear operator that relates {b\ and (c| just as T relates \b) and |c)? The 
answer comes in the following definition- 

adjoint of an operator 2.3.1. Definition* Let T e £>(V) and \a ), \b) € V. The adjoint, or hermitian 

conjugate，ofT is denoted by and defined by 

(fl|T|*)* = (^|T^ \a). (2.8) 

The LHS of Equation (2.8) can be written as {a\ c>* or (c\a), in which case 
we can identify 

(c| = (fe|Tt ^ (T \b))^ = {b\T^ (2.9) 

This equation is sometimes used as the definition of the hermitian conjugate. 
From Equation (2.8), the reader may easily verify that 0 = 1. Thus, using the unit 
operator for T, (2.9) justifies Equation (1.12). 

Some of the properties of conjugation are listed in the following theorem, 
whose proof is left as an exercise. 



2. OPERATOR ALGEBRA 


2.3.2. Theorem. Lete £(V) andaeC. Then 

1. (U + T ) 1 = U f ^-T 1 . 2. (UT ) 1 = T 1 "^. 

3. 4. ((T) t ) t = T. 

The last identity holds for finite-dimensional vector spaces; it does not apply to 
infinite-dimensional vector spaces in general. 

In previous examples dealing with linear operators T : -> R n , an element 

of E n was denoted by a row vector, such as (x ， y) for 3R 2 and (n z) for IR 3 . There 
was no confusion, because we were operating only in V. However, since elements 
of both V and V* are required when discussing T, T*, and Tt ， it is helpful to make 
a distinction between them. We therefore resort to the convention introduced in 
Example 1.2.3 by which 

2,3.3. Box. Kets are represented as column vectors and bras as row vectors. 


2.3.4. Example. Let us find the heimitian conjugate of the operator T : C 3 ^ C 3 given 



with dual vectors (a\ = (a* a*) and (Z?| = 埒 埒 ) ， respectively. We use 

Equation (2.8) to findT^: 


(^|T t |a) = (a|T|^) i,s = a* ♦(fc 

L 

「 ％ i _ 收 + fe\l 

. Vl + - 


= [aj^i — ia*^2 + + 一 + ^3^1 _ 

= ot\P* 4 - + ce\P^ — — + “3 柯 - a 3p2 ~ la 3^3 

= p*(cc\ -i’0f2 + a3) + /^(/ai -a 2 - ia^) 




2.4 HERMITIAW AND UNITARY OPERATORS 63 


hermitian and 
anti-hermitian 
operators 


Therefore, we obtain 


rt 


/ «i\ /«i _ 

(^2] = I i 
W Van - 


i_o?2 + o?3' 
i«i -Qf3 
Q；2 — 


H 


2.4 Hermitian and Unitary Operators 

The process of conjugation of linear operators looks much like conjugation of 
complex numbers. Equation (2.8) alludes to this fact，and Theorem 2.3.2 provides 
further evidence. It is therefore natural to look for operators that are counter- 
parts of real numbers. One can define complex conjugation for operators and 
thereby construct real operators. However, these real operators will not be inter¬ 
esting because — as it turns out — they completely ignore the complex character 
of the vector space. The following alternative definition makes use of hermitian 
conjugation, and the result will have much wider application than is allowed by a 
mere complex conjugation. 

2A1. Definition. A linear operator H e /C(V) is called hermitian，or self-adjoint, 
ifH^ = H. Similarly, A G L(V) is called anti-hermitian ifA^ = —A 


Charles Hermite (1822-1901), one of the most eminent 
French mathematicians of the nineteenth century, was par¬ 
ticularly distinguished for the clean elegance and high artis¬ 
tic quality of his work. As a student, he courted disaster 
by neglecting his routine assigned work to study the classic 
masters of mathematics; and though he nearly failed his ex¬ 
aminations, he became a first-rate creative mathematician 
while still in his early twenties. In 1870 he was appointed to 
a professorship at the Sorbonne, where he trained a whole 
generation of well-known French mathematicians, includ¬ 
ing Picard, Borel, and Poincare. 

The character of his mind is suggested by a remark of Poincare: “Talk with M_ Hermite. 
He never evokes a concrete image, yet you soon perceive that the most abstract entities are to 
him like living creatures.” He disliked geometry, but was strongly attracted to number theory 
and analysis, and his favorite subject was elliptic functions, where these two fields touch 
in many remarkable ways. Earlier in the century the Norwegian genius Abel had proved 
that the general equation of the fifth degree cannot be solved by functions involving only 
rational operations and root extractions. One of Hermite’s most suiprising achievements (in 
1858) was to show that this equation can be solved by elliptic functions. 

His 1873 proof of the transcendence of e was another high point of his career. 4 If he 
had been willing to dig even deeper into this vein, he could probably have disposed of tt as 



々 Transcendental numbers are those that are not roots of polynomials with integer coefficients. 




64 2. OPERATOR ALGEBRA 


well, but apparently he had had enough of a good thing. As he wrote to a Mend，“I shall risk 
nothing on an attempt to prove the transcendence of the number tt. If others undertake this 
enterprise, no one will be happier than I at their success, but believe me, my dear friend, it 
will not fail to cost them some efforts.” As it turned out ， Lindemann’s proof nine years later 
rested on extending Hermite’s method. 

Several of his purely mathematical discoveries had unexpected applications many years 
later to mathematical physics. For example, the Hermitian forms and matrices that he in¬ 
vented in connection with certain problems of number theory turned out to be crucial for 
Heisenberg’s 1925 formulation of quantum mechanics, andHermite polynomials (see Chap¬ 
ter 7) are useful in solving Schrodinger^ wave equation. 


The following observations strengthen the above conjecture that conjugation 
of complex numbers and hermitian conjugation of operators are somehow related. 

expectation value 2.4.2. Definition. The expectation value {T) fl of an operator T in the u state ,J \a) 
is a complex number defined by (T) a = (a\T \a). 

The complex conjugate of the expectation value is 5 
(Tf = {a\T\ar = {a\J^ \a), 

In words, T^, the hermitian conjugate of T, has an expectation value that is the 
complex conjugate of the latter’s expectation value. In particular, if T is hermitian — 
is equal to its hermitian conjugate—its expectation value will be real. 

What is the analogue of the known fact that a complex number is the sum of a 
real number and a pure imaginary one? The decomposition 

T= iCT + T^ + ^T-T^^H+A 

shows that any operator can be written as a sum of a hermitian operator H = 
士 (T + T 卞 ） and an anti - hermitian operator A = 士 (T — 丁卞 ). 

We can go even further, because any anti-hermitian operator A can be written 
as A = in which —iA is hermitian: (—/A)^ = (—= i(—A) = —ik. 

Denoting —iA by we write T = H + iH\ where both H and are hermitian. 
This is the analogue of the decomposition z= x -\-iy m which both 义 and y are 
real. 

Clearly, we should expect some departures from a perfect correspondence. 
This is due to a lack of commutativity among operators. For instance, although the 
product of two real numbers is real, the product of two hermitian operators is not, 
in general, hermitian: 

(TU) 1 * = VT 个 =UT ^ TU. 


5 When no risk of confusion exists, it is common to drop the subscript “a” and write {T> for the expectation value of T_ 




2.4 HERMITIAN AMD UNITARY OPERATORS 


We have seen the relation between expectation values and conjugation properties 
of operators. The following theorem completely characterizes hermitian operators 
in terms of their expectation values: 

2.4.3. Theorem. A linear transformation Won a complex inner product space is 
hermitian if and only if {a\\\ \a) is real for all \a). 

Proof. We have already pointed out that a hermitian operator has real expectation 
values. Conversely, assume that {a\ H \a) is real for all \a). Then 

(a\H\a) = (fl|H|a)* = (fl| H 1 * \a) 分 {a \H - H f [fl> = 0 V \a). 

By Theorem 2.1.4 we must have H _ H 卞 = 0. □ 

2.4.4. Example. In this example^ we illustrate the result of the above theorem with 2x2 
matrices. The matrix H = (? "J*) is hermitian 6 and acts on € 2 . Let us take an arbitrary 
vector ⑷ = ( 沿 ） and evaluate (a| H |a). We have 

^=C 1)0-^) - 

Therefore ， 

{a\H\a) = (orj 5 ot^) (:: 2 ) = -m\a 2 + ict^otx 
=H- = IReiia^fxi), 

and (a \ H \a) is real. 

For the most general 2x2 hermitian matrix H ^ ^), where a and y are real, we 

have 

_ = (; ⑽ = (;=) 
and 

(a\H\a)= ： (a* a*) (^ + ^ 2 ) = + + ya 2 ) 

= «1«1| 2 4- or*^o ；2 + 4 - y |« 2| 2 

=a I 2 + y|of 2 | 2 + 2Re(a^a 2 ). 

Again {a\ H \a) is real. 睡 

2«4.5, Definition. An operator A on an inner product space is called positive 
positive operators (written A > if A is hermitian and {a\ A \a) > 0 for all \a). 

6 We assume that the reader has a casual familiarity with hermitian matrices. Think of an b x n matrix as a linear operator that 
acts on column vectors whose elements are components of vectors defined in the standard basis of C w orM w . A hermitian matrix 
then becomes a hermitian operator. 


66 2. OPERATOR ALGEBRA 


2.4.6. Example. An example of a positive operator is the square of a hermitian opera¬ 
tor . 7 We note that for any hermitian operator H and any vector |a), we have {a\ H 2 \a) = 
(a I H t H I = (Ha I Ha) > 0 because of the positive definiteness of the inner product. 豳 

An operator T satisfying the extra condition that (a|T[a) = 0 implies |a) = 0 
positive definite is called positive definite. From the discussion of the example above, we conclude 
operators that the square of an invertible hermitian operator is positive definite. 

The reader maybe familiar with two- and three-dimensional rigid rotations and 
the fact that they preserve distances and the scalar product. Can this be generalized 
to complex inner product spaces? Let \a ), \b) € V, and let U be an operator on V 
that preserves the scalar product; that is, given \b f ) = U \b) and \a r ) = U |a>, then 
{a r \b f ) = (a|Z7).This yields 

{a f \b f ) = ({a\ U 1 )^ 1^» = (a\ |^} = (a\b) = (a\ 1 \b ). 

Since this is true for arbitrary \a) and |^>, we obtain = 1 • In the next chapter, 
when we introduce the concept of the determinant of operators, we shall see that 
this relation implies that U and U 卞 are both invertible, 8 with each one being the 
inverse of the other. 


unitary 叩 erators 


2.4.7. Definition. Let V be a finite-dimensional inner product space. An operator 
U is called a unitary operator z/U 1 * = U _1 . Unitary operators preserve the inner 
product ofV. 

2.4.8. Example. The linear transformation T : € 3 ^ C 3 given by 



(ai - i«2)/v^ 

(ai + ia% - 2«3)/\/6 
+ o：3 + i(a\ +«2 + «3)}/^ 


is unitary. In fact, let 



with dual vectors (a\ - (afj a| o|) and 沙 | = (% 杉 ％ ) ， respectively. We use 
Equation (2.8) and the procedure of Example 2.3.4 to find ft. The result is 





ia\ ia2 
-\/6 


o?3(l + 0 


I 2a 2 , «3(l-0 I 

V —W + ) 


^This is further evidence that hermitian operators are analogues of real numbers: The square of any real number is positive. 
8 This implication holds only for finite-dimensional vector spaces. 


• 5 PROJECTION OPERATORS 


and we can verify that 



Thus TT^ = 1. Similarly, we can show that T 卞 T = 1 and therefore that T is unitary. 


m 


2.5 Projection Operators 

We have already considered subspaces briefly. The significance of subspaces is 
that physics frequently takes place not inside the whole vector space, but in one 
of its subspaces. For instance, although motion generally takes place in a three- 
dimensional space, it may restrict itself to a plane either because of constraints or 
due to the nature of the force responsible for the motion. An example is planetary 
motion, which is confined to a plane because the force of gravity is central. Fur¬ 
thermore, the example of projectile motion teaches us that it is veiy convenient 
to “project” the motion onto the horizontal and vertical axes and to study these 
projections separately. It is, therefore, appropriate to ask how we can go from a 
full space to one of its subspaces in the context of linear operators. Let us first 
consider a simple example. A point in the plane is designated by the coordinates 
(x, y). A subspace of the plane is the jc-axis. Is there a linear operator, 9 say P x , 
that acts on such a point and somehow sends it into that subspace? Of course, there 
are many operators from E 2 to R. However, we are looking for a specific one. 
We want P x to project the point onto the 太 -axis. Such an operator has to act on 
(x, y) and produce (a:,0): P x (x, y) = (x 9 0). Note that if the point already lies on 
the -c-axis, P x does not change it. In particular, if we apply P ^： twice, we get the 
same result as if we apply it only once. And this is true for any point in the plane. 
Therefore, our operator must have the property = P x . We can generalize the 
above discussion in the following definition. 10 

projection operators 2,5 丄 Definition. A hermitian operator P g rC(V) is called a projection operator 

ifP 2 = R. 

From this definition it immediately follows that the only projection operator 
with an inverse is the identity operator. (Show this!) 

Consider two projection operators Pi and P2. We want to investigate conditions 
under which Pi + P2 becomes a projection operator. By definition, Pi + P 2 = 
(Pi + P2) 2 = Pj + P1P2 + P2P1 + P!. So Pi + P2 is a projection operator if and 
only if 

P1P2 + P2P1 =0. (2.10) 


9 We want this operator to preserve the vector-space structure of the plane and the axis. 

10 It is sometimes useful to relax the condition ofhermiticity. However, in this part of the book, we demand that P be hermitian. 



. OPERATOR ALGEBRA 


Multiply this on the left by Pi to get 

P1P2 + P1P2P1 =0 P1P2 + P1P2P1 =0. 

Now multiply the same equation on the right by Pi to get 
P1P2P1+P2P? = 0 P1P2P1 +P2P1 = 0 - 
These last two equations yield 

P 1 P 2 - P 2 P 1 == 0. (2.11) 

The solution to Equations (2.10) and (2.11) is P 1 P 2 = p 2 Pi = 0. We therefore 
have the following result. 

2,5.2. Proposition. Let Pi, P2 e L(V) be projection operators. Then Pi + P% 

is a projection operator if and only if P1P2 = P2P1 = 0 . Projection operators 

orthogonal projection satisfying this condition are called orthogonal projection operators. 

operators . 

More generally, if there is a set {Pi}^ x of projection operators satisfying 



if …， 


then P = a ^ soa projection operator. Given a normal vector \e), one can 

show easily that P = \e) (e\ is a projection operator: 

0 Pishermitian: pt = (|e 》 (e\)^ = ((e|)^(|e))' i ' = \e) {e\. 

0 P equals its square: P 2 = (\e) (e|)(|^> (e\) = \e) {e\e) {e\ = \e)- {e\. 

=1 

In fact, we can take an orthonormal basis B = (1^*)}^! and construct a set of 
projection operators {P* = \ei) {ei 1}^. The operators P,- are mutually orthogonal. 
Thus, their sum YliLi p i ^ so a projection operator. 

2.5.3. Proposition. Let B = be an orthonormal basis for V//. Then the 

completeness set {P« = \et) 1}=^ consists of mutually orthogonal projection operators, and 
relation 1 句 》⑹ = 1 - This relation is called the completeness relation. 

Proof. The mutual orthogonality of the P* is an immediate consequence of the 
orthonormality of the |e/}. To show the second part, consider an arbitrary vector 
\a) 9 written in terms of |^*>: \a) — Y!j=\ a j 1^)- Apply Pf to both sides to obtain 



Therefore, we have 


2.5 PROJECTION OPERATORS 69 


N 


N 


l«) = Yl ai ^ = 


㈣ 


l«> • 


Since this holds for an arbitrary \a), the two operators must be equal. □ 

If we choose only the first m < N vectors instead of the entire basis, then 
the projection operator p( m ) 三 \ e i) ( e i\ projects arbitrary vectors into the 
subspace spanned by the first m basis vectors In other words, when P 伽） 

acts on any vector \a) e V y the result will be a linear combination of only the first 
m vectors. The simple proof of this fact is left as an exercise. These points are 
illustrated in the following example. 

2.5*4. Example* Consider three orthonormal vectors € JR 3 given by 






V2 


k2> = 3 ' 2 



k3) 


V3 



The projection operators associated with each of these can be obtained by noting that {ei \ 
is a row vector. Therefore, 


pi—= 1 0) 


2 


0、 

0 


、0 0 


Similarly, 


1 


2 


P2= ^) a - 1 i ^ 


P3 



(-1 1 1 ) 


L — 1 一 ] 
1 1 1 
1 1 1 


Note that P ； projects onto the line along \ei). This can be tested by letting act on 
an arbitrary vector and showing that the resulting vector is perpendicular to the other two 
vectors. For example, let P 2 act on an arbitrary column vector: 


l«> = P 2 


0 




2 


■2 4 



6 


x - y-\-2z 
-x -\-y-2z 
lx —2y -\- Az> 


We verify that \a) is perpendicular to both \e\) and ^ 3 ): 

1 1 / \ 

(^ll«> ^ -J=(X 1 0) - I -x + -2^ I = 0. 
▽ 0 \2x -2y-\- \z) 



. OPERATOR ALGEBRA 


Similarly, {e^a) = 0. So indeed, \a) is along |^). 

We can find the operator that projects onto the plane formed by \e\) and |^2) - This is 

l/ 2 1 1 

Pj + P2 = — I 1 2 —1 

3 \1 -1 2 

When this operator acts on an arbitrary column vector, it produces a vector lying in the 
plane of \e\) and \e 2 ), or perpendicular to | 杉 3 >: 

—)0=KL\ 

It is easy to show that (e^\b) = 0. The operators that project onto the other two planes are 
obtained similarly. Finally, we verify easily that 



P 1 + P 2 + P3 =(； : S )-- 
\0 0 1 / 



2.6 Operators in Numerical Analysis 

In numerical calculations, limiting operations involving infinities and zeros are 
replaced with finite values. The most natural setting for the discussion of such 
operations is the ideas developed in this chapter. In this section, we shall assume 
that all operators are invertible, and (rather sloppily) manipulate them with no 
mathematical justification. 

2.6.1 Finite-Difference Operators 

In all numerical manipulations, a function is considered as a table with two 
columns. The first column lists the (discrete) values of the independent variable 
x“ and the second column lists the value of the function / at xt ，We often write 

Si for/te)* _ 

Three operators that are in use in numerical analysis are the' forward difference 
forward, backward, operator △，the backward difference operator V (not to be confused with the 

and central difference gradient), and the central difference operator S. These are defined as follows: 
operators 

Afi = f w - f i9 Vfi = ft - fi-u 6ft = / 1 -/ 1 . 

l -2 (2.12) 

The last equation has only theoretical significance, because a half-step is not used 
in the tabulation of functions or in computer calculations. Typically, the data are 
equally spaced, so — Xi = ft is the same for all i. Then fi±\ = f(xi ± ft), 
and we define fi ±\/2 = /fe ± ft/2). 


shifting and 
averaging operators 


2.6 OPERATORS IN NUMERICAL ANALYSIS 71 


We can define products (composition) of the three operators. In particular ， A 2 
is given by 

^ 2 fi — △( 力 +1 — 力） =/f+2 - 2 力 +1 + /•• (2.13) 

Similarly, 

V 2 /；- = V(fi - ft-i) = ft- 2fi. x + /,_ 2 , 6 1 f i =f i + 1 —2f l + 

(2.14) 


We note that 

S 2 fi = Ui -fi- (fi - ft-i) = (A - W)fi ^ - A - V. 

v - V - ' ^ -V- ’ 


This shows that the three operators are related. 

It is convenient to introduce the shifting and averaging operators, respectively 
E and /i, as 


E/(x) = /U+/0, 


M/W = ; [/p+ •)+/(” — 会 )] • 


(2.15) 


Note that for any positive integer n, E n f(x) = f(x 4 - nh). We generalize this to 
any real number a: 

(2.16) 

All the other finite-difference operators can be written in terms of E: 

A = E — 1 ， V = 1 - E _1 , 6 = E 1/2 - E~ 1/2 , i(E 1/2 + E~ 1/2 ) 

(2.17) 

The first two equations of (2.17) can be rewritten as 

E = 1+A ， E^d-V)" 1 . (2.18) 

We can obtain a useful formula for the shifting operator when it acts on polynomials 
of degree n or less. First note that 

1 一 V n+1 = (1 — V)(1 + V + … + V K ). 

But V n+1 annihilates all polynomials of degree n or less (see Problem 2.33). 
Therefore, for such polynomials, we have 


1 =(1-V)(1+V + ... + V n ), 



72 2. OPERATOR ALGEBRA__ 

which shows that E = (1 - V) -1 = 1 + ▽ + … + V' Now let n — oo and 
obtain 

00 

E = (1 - ▽ 厂 1 = (2.19) 

k=0 

for polynomials of any degree and — by Taylor expansion~~for any (well-behaved) 
function. 

2.6.1. Example* Numerical interpolation illustrates the use of the formulas derived 
above. Suppose that we are given a table of the values of a function, and we want the 
value of the function for an jc located between two entries. We cannot use the table directly, 
but we may use the following procedure. 

Assume that the values of the function / are given for x \, X 2 . 又 /.and we are 

interested in the value of the function for x such that < x < : . This corresponds to 
/i +r , where 0 < r < 1. We have 

= E r fi = (1 + A) r fi = (1 + rA + r - r 2 ” 厶 2 + … ) 力， (2.20) 

In practice, the infinite sum is truncated after a finite number of terms. 

If only two terms are kept, we have 

fi-^r ^ (1 + rA)fi = fi - fi) = (1 - r)fi + (2.21) 

In particular, for r = ^, Equation (2.21) yields A+ 1/2 ^ + /«+l)» which states the 

reasonable result that the value at the midpoint is approximately equal to the average of the 
values at the end points. 

If the third term of the series in Equation (2.20) is also retained, then 

f i+ r - [l+rA+ ^ 叫 f^f i+ rAf i + ^ 

=fi +K/f+l - fi) + "(/f +2 - 2 /^i + fi) (222) 

(2 —r)(l - r): . ■ r{r - 1 ) r 

= - 2 - 力 + r (2 — r)/*+i -h -- fi+2, 

Forr = that is, at the midpoint between 巧 and 而 + 1 ， Equation (2.22) yields 

/i+1/2 ^ |// + \fi+l - |/«+2» 

which turns out to be a better approximation than Equation (2.21). However, it involves not 
only the two points on either side of x but also a relatively distant point, + 2 . If we were to 
retain terms up to A k for k > 2, then fi+ r would be given in terms of fi ， //+i，.. ■, fi-\-ky 
and the result would be more accurate than (2.22). Thus, the more information we have 
about the behavior of a function at distant points, the better we can approximate it at 
x e (x i9 xi^i). 

The foregoing analysis was based on forward interpolation. We may want to use back¬ 
ward interpolation, where _ r is sought for 0 < r < 1. In such a case we use the backward 
difference operator 

fi-r = (E- 1 y/r=(1- ^Yfi = (1 — rV + + ■■■)/;. 趣 


2.6 OPERATORS IN NUMERICAL ANALYSIS 73 


differentiation and 
integration operators 


“exact” relation 
between shifting and 
differentiation 
operators 


2.6.2. Example. Let us check the conclusion made in Example 2.6.1 using a calculator 
and a specific function, say sinx. A calculator gives sin(0.1) = 0.0998334, sin(0.2)= 
0.1986693, sin(0.3) = 0.2955202. Suppose that we want to find sin(0.15) by interpolation. 
Using Equation (2.21) with r = : 去 ， we obtain 

sin(0.15) ^ |[sin(0.1) + sin(0.2)] = 0.1492514. 

On the other hand, using (2.22) with r = ^ yields 

sin(0.15) ^ I sin(0.1) H- | sin(0.2) — | sin(0.3) = 0.1494995. 

The value of sin(0.15) obtained using a calculator is 0.1494381. It is clear that (2.22) gives 
a better estimate than ( 2 . 21 ). M 


2.6.2 Differentiation and Integration Operators 

The two most important operations of mathematical physics can be written in terms 
of the finite-difference operators. Define the differentiation operator D and the 
integration operator J by 

rx-^-h 

D/(^) = Af(x) = / f(t) dt. (2.23) 

Jx 

Assuming that D— 1 exists, we note that f(x) = D 一 1 This shows that D 一 1 is 

the operation of antidifferentiation: D~ l f(x) = F(x), where F is any primitive of 
/ . On the other hand, AF(j:) = F(x + h) — F(x) = J/(x), These two equations 
and the fact that J and D commute (reader, verify!) show that 

Ad -1 =J JD = DJ = A = E - 1. (2.24) 

Using the Taylor expansion, we can write 




orE 


D 


MO 


.This yields hD = In E, or 


^ n\ 


_«=o 


f(x) = e hD f(x), 


(2.25) 


A 2 A 3 

，十 ( LA ) 弋 （ A - 了 + 了 


——» ■ 


(2.26) 


2.6.3. Example. Let us calculate cos(O.l), considering cos jc to be (d/dx)(smx) and 
using the values given in Example 2.6.2. Using Equation (2.26) to second order, we get 


D/i = ^(A 


A 2 




|[/i+l -fi~ 2^+2 - 2/j+i +/*)]= 去 (一 /i+2 +4 力 +1 - 3fi). 



2. OPERATOR ALGEBRA 


“exact” relation 
between integration 
and difference 
operators 


This gives 

cos(O.l) ^ ^ [-0.295520 + 4(0.198669) - 3(0.099833)] = 0.998284. 

In comparison, the value obtained directly from a calculator is 0.995004. 

The operator J is a special case of a more general operator defined by 






I f(t)dt = F(x + ah) — F(x) 
Jx 

(E a -A)F(x) = (E a 


J^CE^DD-^/z — 
where we used Equation (2.26). 


( 14 -厶尸 — 

ln(1 + A) 


(2.27) 


2.6.3 Numerical Integration 

Suppose that we are interested in the numerical value of f(x)dx. Let xo = 
and jcjv = b, and divide the interval [a, b] into N equal parts, each of length 
h 二 （b — a)/N, The method commonly used in calculating integrals is to find 
the integral 广 ah f(x)dx, where or is a suitable number, and then add all such 
integrals. More specifically, 


*xo+ah 


•xo-\-2ah 


»xo~\-Moth 


f(x) dx + 


'XQ+ah 


f(x)dx H - h 


^ XQ^{M—\)ah 


f(x) dx. 


where M is a suitably chosen number. In fact, since xn = xo + Nh 9 we have 
Ma = N. We next employ Equation (2.27) to get 


*a/0 + + •. • + ^af{M-l)a = 


( e 1 

\ k=0 


(2.28) 


where fk a ^ /( 工 o + kah). We thus need an expression for J a to evaluate the 
integral. Such an expression can be derived elegantly, by noting that 


E s ds 


InE o 


▲ 


[by Equation (2.27)] 


so that 


h E s ds = h / (1+A)^ = /ty]^A\ 
Jo Jo Ir—(\ 


s(s — l) ••• (s — k + l)ds, 


( 2 . 29 ) 


2.6 OPERATORS IN NUMERICAL ANALYSIS 75 


trapezoidal rule for 
numerical integration 


Simpson’s one-third 
rule for numerical 
integration 


Simpson’s 
three-eighths rule for 
numerical integration 


where we expanded (1 + A using the binomial infinite series. Equations (2.28) 
and (2.29) give the desired evaluation of the integral. 

Let us make a few remarks before developing any commonly used rules of 
approximation. First, once h is set, the function can be evaluated only at xo + nh, 
where n is a positive integer. This means that f n is given only for positive integers 

Thus, in the sum in (2.28) ka must be an integer. Since k is an integer, we 
conclude that a must be an integer. Second, since N = Ma for some integer Af, 
we must choose N to be a multiple of a. Third, if we are to be able to evaluate 
[the last term in (2.28)], Jq- cannot have powers of A higher than a, 
because A n contains a term of the form 

f(xQ + (M — l)ah + nh) = f(x^ + (« — ot)h), 

which forn > a gives / at a point beyond the upper limit. Thus, in the power-series 
expansion of we must make sure that no power of A beyond a is retained. 

There are several specific J a ’s commonly used in numerical integration. We 
will consider these next. The trapezoidal rule sets a = 1. According to the 
remarks above, we therefore retain terms up to the first power in the expansion of 
J a . Then (2.29) gives Ji = J = /z(1 + - A ). Substituting this in Equation (2.28)， 
we obtain 

( N-l \ N-l 

0 / k=Q 

= 9 + fk+\) = x (/0 + 2 /i + ••• + 2f N -\ + fy). 

々=0 (2.30) 

Simpson’s one-third rule sets a = 2. Thus, wc have to retain all terms 
up to the A 2 term. However, for a = 2, the third power of A disappears in 
Equation (2.29), and we get an extra “power 1 , of accuracy for free! Because of 
this, Simpson’s one-thirdruleis popular for numerical integrations. Equation (2.29) 
yields J 2 = 2h(1 + A + gA 2 ). Substituting this in (2.28) yields 

h N/2-i h iV/2-l 

^ ^ ^2 (61 + 6 A + A 2 )/ 2 fc = - ^2 i.flk+2 + + f2k) 

J k=0 6 k=0 

h 

= 3 (/o+4/i + 2/2 + 4/3 + … + 4/^_! + fy ). (2.31) 

It is understood, of course, that AT is an even integer. The factor | gives this method 
its name. 

For Simpson’s three-eighths rule, we set a = 3, retain terms up to A 3 , and 
use Equation (2.29) to obtain 

J 3 = 3ft(1 + + IA 2 + 备 A〗）=—(81 + 12A + 6 A 2 + A ^). 

^ 」 O 


76 2. OPERATOR ALGEBRA 


Substituting in (2.28), we get 

N/3-1 

/ = ^ V (8l + 12A+6A 2 + A 3 )/ 3jt 

8 it:0 
N/3 - 1 

=—(/3*+3 + 3/3fc+2 + 3/3fc + i + f3k)- (232) 

8 fc=0 

2.6.4. Example* Let us use Simpson’s one-third rule with four intervals to evaluate the 
familiar integral I = /q 1 e x dx. TOth/z = 0.25 and N = 4, Equation (2.31) yields 

/ ^ ^(1+ 4e 0 - 25 + 2e 05 + 4e 0J5 + ^ = 1.71832. 

This is very close to the “exact” result e — 1^ 1.71828. M 


2.7 Problems 

2.1. Consider a linear operator T on a finite-dimensional vector space V. Show that 
there exists a polynomial p such that p(T) = 0. Hint: Take a basis B = {1^/)}^ 
and consider the vectors {T* |ai)}^l 0 for large enough M and conclude that there 
exists a polynomial pi(T) such that ^(T) |^i) = 0. Do the same for |a 2 >, etc. Now 
take the product of all such polynomials. 

2.2. Use mathematical induction to show that [A, A m ] = 0. 

2.3. For D and T defined in Example 1.3.4: 

(a) Show that [D ， T] = 1 • 

(b) Calculate the linear transformations D 3 T 3 and T 3 D 3 . 

2,4_ Consider three linear operators Li, L 2 , and L 3 satisfying the commutation re¬ 
lations [Li, L 2 ] = L 3 , [L 3 , Li] = L 2 , [L 2 , L 3 ] = Li, and define the new operators 
L± = Li 土 £丄 2 . 

⑻ Show that the operator L 2 = Lj + l| + commutes with L*, fe = 1,2, 3. 

(b) Show that the set {L+, L_, L 3 } is closed under commutation, i.e.，the commuta¬ 
tor of any two of them can be written as a linear combination of the set. Determine 
these commutators. 

(c) Write L 2 in terms of L+，L_，and L 3 . 

2*5. Prove the rest of Proposition 2.1.11. 

2.6. Show that if [[A, B], A] = 0, then for every positive integer k, 

[A k 9 B] = B]. 


Hint: First prove the relation for low values of ft; then use mathematical induction. 








2.7 PROBLEMS 77 


2_7. Show that for D and T defined in Example 1.3.4, [D*,T] = and 

[J k , D]= 一 冰 - 1 • 

2.8_ Evaluate the derivative of H 一 1 ⑴ in terms of the derivative of H(t). 

2.9. Show that for any a， 卢 e E and any H € «C(V)，we have 

= 灼 H 

2.10. Show that (U + T)(U — T) = U 2 — T 2 if and only if [U, T] = 0. 

2.11. Prove that if A and B are hermitian, then i [A, B] is also hermitian. 


2.12, Find the solution to the operator differential equation 


dU 

dt 


= 川 u(o. 


Hint: Make the change of variable y = t 2 and use the result of Example 2.2.3. 

2.13. Verify that 


d n /dH\ 9 /dH\ ' 

-h^(-)h^ + h(-)h + h^ 


dH\ 


2.14. Show that if A and B commute, and / and g are arbitrary functions, then 
/(A) and g(B) also commute. 

2.15. Assuming that [[S, T], T] = 0 = [[S, T], S], show that 


[S, exp(^T)] = f [S, T] exp(rr). 

Hint: Expand the exponential and use Problem 2.6. 

2.16. Prove that 

exp(Hi + H 2 + H 3 ) = exp(Hi) exp(H 2 ) exp(H 3 ) 

• exp{-i([H 1 ,H 2 ] + [Hi,H 3 ] + [H 2 , H 3 ])} 

provided that Hi, H 2 , and H 3 commute with all the commutators. What is the 
generalization to Hi + H 2 H - h H„? 

2.17. Denoting the derivative of A(0 by A, show that 
^[A, B] = [A, B] + [A, B]. 

2.18. Prove Theorem 2.3.2. Hint: Use Equation (2.8) and Theorem 2.1.3. 

2.19. Let A(f) = exp(^H)Ao exp(—rH), where H and Ao are constant operators. 

Show that dA/dt = [H, What happens when H commutes with A(0? 


. OPERATOR ALGEBRA 


2.20. Let I/), \g) e C(a, b) with the additional property that f{a) = g(a ) — 
f(b) = g(b) = 0. Show that for such functions, the derivative operator D is 
anti-hermitian. The inner product is defined as usual: 

(f\g)^ f b r(og(odt. 

Ja 

2.21. In this problem, you will go through the steps of proving the rigorous state- 
Heisenberg ment of the Heisenberg uncertainty principle. Denote the expectation (average) 

uncertainty principle value of an operator Aina state |^) by A aV g. Thus, A avg = (A )= (則 A | 屯 〉 ■ The 

uncertainty (deviation from the mean) in state [^> of the operator A is given by 

AA = y/((A- A avg ) 2 ) = y{^|(A-A avg 1) 2 |^). 

(a) Show that for any two hermitian operators A and B, we have 
K 則 AB |^)| 2 < (^|A 2 |^) (^|B 2 |^). 

Hint: Apply the Schwarz inequality to an appropriate pair of vectors. 

(b) Using the above and the triangle inequality for complex numbers, show that 

|< 叫 [A, B] |^)| 2 <4(^| A 2 |^> {^|B 2 |^>. 

(c) Define the operators A’ = A _ cd ， B’ = B _ 卢 1, where a and ^ are real 
numbers. Show that A' and B’ are hermitian and [A\ B 7 ] = [A, B], 

(d) Now use all the results above to show the celebrated uncertainty relation 

(AA)(A5)>i|(^|[A,B]|^)j. 

What does this reduce to for position operator x and momentum operator p if 

[x, p] =： ihl 

2.22. Show that U = exp A is unitary if and only if A is anti-hermitian. 

2.23. Find T 1 ^ for each of the following linear operators. 

(a) T : M 2 — R 2 given by 

WO 

(b) T : E 3 R 3 given by 

0 / x-^-ly-z \ 

— I 3x — y + 2z J . 

V _ x + 2_y - {- 3zl 

⑹ T : M 2 ^ E 2 given by 

/x\ _ /xcosO — y sm0\ 

\ 3 ?/ ~ Vxsin0 + y cosO/ 7 


2.7 PROBLEMS 79 


where ^ is a real number. What is T^T? 
⑹ T:(C 2 — C 2 given by 


<) = (=)• 
(e) T : C 3 C 3 given by 



a\ 4 - ioti — 2 io ：3 
-2ia\ + o：2 + /o?3 
ia\ — 2i«2 + o?3 



2*24. Show that if P is a (hermitian) projection operator, so are (a) 1 — P and (b) 
utpu for any unitary operator U. 

2,25. For the vector 

0 \ 

1 

-1 

0 / 


V2 


(a) Find the associated projection matrix, P a , 

(b) Verify that P a does project an arbitrary vector in C 4 along |a>. 

(c) Verify directly that the matrix 1 一 is also a projection operator. 

2.26. Let |fli> = ai = (1,1, —1) and | 叱 > =a 2 = (-2,1, —1). 

(a) Construct (in the form of a matrix) the projection operators Pi and ?2 that 
project onto the directions of |ai) and |« 2 > ， respectively. Verify that they are indeed 
projection operators. 

⑻ Construct (in the form of a matrix) the operator P = Pi + P 2 and verify 
directly that it is a projection operator. 

(c) Let P act on an arbitrary vector (x, y, z). What is the dot product of the resulting 
vector with the vector ai x a 〗？ What can you say about P and your conclusion in 
⑻？ 

2.27. Let p( m ) = Y1T=i \ e i) ^*1 a projection operator constructed out of the 

first m orthonormal vectors of the basis B = ofV. Show that P ㈣ projects 

into the subspace spanned by the first m vectors in B. 

2.28. What is the length of the projection of the vector (3 ， 4, —4) onto a line whose 

parametric equation is x = + 1， y + 3, z = £ _ 1? Hint: Find a unit vector 

in the direction of the line and construct its projection operator. 

2.29. The parametric equation of a line L ina coordinate system with origin O is 


a: = + 1 ， 


y = 


z = —2t + 2 . 



80 2. OPERATOR ALGEBRA 


A point P has coordinates (3, —2,1). _ 

(a) Using the projection operators, find the length of the projection of OP on the 
line L. 

(b) Find the vector whose beginning is P and ends on the lineL and perpendicular 
to L, 

(c) From this vector calculate the distance from 尸 to the line L. 

2.30* Let the operator U : C 2 — C 2 be given by 

Is U unitary? 

2.31. Show that the product of two unitary operators is always unitary, but the 
product of two hermitian operators is hermitian if and only if they commute. 

2.32令 Let B be an operator that is both unitary and hermitian. Show that 
⑷ S is involutive (i.e., S 2 = 1). 

(b) S = P+ _ P，，where P 4- and P _ are hermitian projection operators. 

2.33. Show that when the forward difference operator is applied to a polynomial, 
the degree of the polynomial is reduced by 1. (Hint: Consider x n first.) Then show 
that annihilates all polynomials of degree n or less. 

2.34. Show that and annihilate any polynomial of degree n, 

2.35. Show that all of the finite-difference operators commute with one another. 

2.36. Verify the identities 

VE = 6E l/2 = A, 

V + A = 2^« = E-E _1 , 

E ~ 1/2 = // - ( 6 / 2 ), 

2.37. By writing everything in terms of E, show that = A — V = AV. 

2.38. Write expressions for E 1 / 2 , A, V, and /x in terms of 6. 

2.39. Show that 

D= r inh_1 Q- 

2.40. Show that 

D 2 = & (A 2 — A 3 + j^A 4 - § A 5 - ) 

and derive Equation (2.27). 


2.7 PROBLEMS 81 


2.41. Find an expression for J a in powers of A. Retain all terms up to the fourth 
power. 


2.42. Show that for a = 2, the third power of A disappears in Equation (2.29). 

2.43^ Evaluate the following integrals numerically, using six subintervals with 
the trapezoidal rule, Simpson’s one-third rule, and Simpson’s three-eighths rule. 
Compare with the exact result when possible. 


•5 


(a) I x^dx. 

Jo 

〆 1 

(d) 


(g) 


—dx. 

x 

1 dx 


1+x 2 , 





⑻ 

⑻ 



f 12 y 

(c) / xe co&xdx. 

Jo 

(/) J e x% sin^: 


(0 



e x tanxdx. 


Additional Reading 

1. Axler, S. Linear Algebra Done Right, Springer-Verlag ， 1996. 

2. Greub, W. Linear Algebra, 4th ed., Springer-Verlag, 1975. 

3. Hildebrand, F. Introduction to Numerical Analysis, 2nd ed., Dover, 1987. 
Uses operator techniques in numerical analysis. It has a detailed discussion 
of error analysis，a topic completely ignored in our text. 


Matrices: Operator Representations 


So far, our theoretical investigation has been dealing mostly with abstract vec¬ 
tors and abstract operators. As we have seen in examples and problems, concrete 
representations of vectors and operators are necessary in most applications. Such 


jpresem 
;rms of 


ntations are obtained by choosing a basis and expressing all operations in 
f components of vectors and matrix representations of operators. 


3.1 Matrices 


Let us choose a basis By = of a vector space and express an arbitrary 

i r 

vector W in this basis: \x) = l^/>-We write 



(3.1) 


representation of and say that the column vector x represents \x) in By. We can also have a linear 
vectors transformation A € £(Vat, W^) act on the basis vectors in By to give vectors in 
the Af-dimensional vector space IV at ： \wk) = A la*〉. The latter can be written as 
a linear combination of basis vectors Bw = Wm: 


irx in. lya 

i^i)= \bj) , iw ； 2)= \bj) , …， =y^p?/Ar \bj) . 


Note that the components have an extra subscript to denote which of the vectors 
{ju；/)}^ they are representing. The components can be arranged in a column as 


3.1 MATRICES 


before to give a representation of the corresponding vectors: 


, …, Wjy 




^M2 


OtMN. 


The operator itself is determined by the collection of all these vectors, i.e., by a 
matrix. We write this as 


«12 … 

C(\N 

(X22 … 

<X2N 

m 

V 

• 

«M2 … 

V 

OtMN 


(3.2) 


representation of 
operators 


and call A the matrix representing A in bases By and Bw- This statement is also 
summarized symbolically as 


A \at) = \bj ), i = 1 ， 2,… ， N. 


(3.3) 


We thus have the following rule: 


3 丄 1. Box. To find the matrix A representing A in bases By = {|fli)}j^ =1 
and Bw = express A \at) as a linear combination of the vectors 

in Bw- The components form the ith column of A. 


Now consider the vector \y) = A |x) in Wm. This vector can be written in two 
ways: On the one hand, \y) = J2f=\ 1^；>. On the other hand. 


IT i T 

\y) = A|jc) = \ai) — y^g/Alai) 


N / M 

EME- 

*_=1 v=l 


ji i^>) = . 

7=1 \i=l 


Since \y) has a unique set of components in the basis Bw^ we conclude that 
N 


T}j = _/ = 1 ， 2, ，■. 


(3.4) 




84 3. MATRICES: OPERATOR REPRESENTATIONS 


The operator T/i 
associated with a 
matrix A 


This is written as 




(«ll «12 … oe\ N ^ 


(Hx\ 

m 

■ 

_ 

= 

«21 0(22 … Oi 2 N 

• ■ » 


* 

； « 

\nM/ 


■ ■ » 

■ ■ « 

<XM2 ^MN/ 


■ 


y = Ax, 


(3.5) 


in which the matrix multiplication rule is understood. This matrix equation is the 
representation of the operator equation \y) = A|^) in the bases By and Bw- 
The construction above indicates that — once the bases are fixed in the two 


vector spaces—to every operator there corresponds a unique matrix. This unique¬ 
ness is the result of the uniqueness of the components of vectors in a basis. On 


the other hand, given an M x matrix A with elements j , one can construct a 
unique linearoperatorTA defined by its action on the basis vectors (see Box 1.3.3): 

M = a ji \^j)- Thus, there is a one-to-one correspondence between op- 
erators and matrices. This correspondence is in facta linear isomorphism: 


3.1.2. Proposition. The two vector spaces Ji(V N , W M ) and M MxN are isomor¬ 
phic. An explicit isomorphism is established only when a basis is chosen for each 
vector space, in which case，an operator is identified with its matrix representation. 


Given the linear transformations k : V N Wm and B : ^ Uk, 

we can form the composite linear transformation B o A : Uk. We can 

also choose bases By = , B w = {\bi)}f =l ,Bu = {1^}}^ for V, W, 

and U ， respectively. Then A, B, and B o A will be represented by an. M x N, a 
K x M,and3.K x N matrix, respectively, the latter being the matrix product of 
the other two matrices. Matrices are determined entirely by their elements. For 
this reason a matrix A whose elements are an, o?i 2 , • • ■ is sometimes denoted by 
(a")_ Similarly, the elements of this matrix are denoted by (A)^. So, on the one 
hand, we have (of/y) = A, and on the other hand (A)" = a". In the context of this 
notation, therefore, we can write 


(A + B)* 7 ^ (A)^ + (B)ij =^. ( aij + ft 7 ) = (atj) + ( 如)， 

= y(A) y yipiij)= ( 陶)， 

⑼ G = 0 ， 

0)jy = ^7- 

A matrix as a representation of a linear operator is well-defined only in refer¬ 
ence to a specific basis. A collection of rows and columns of numbers by themselves 
have no operational meaning. When we manipulate matrices and attach meaning 
to them, we make an unannounced assumption regarding the basis: We have the 

standard basis of C n (or W 1 ) in mind. The following example should clarify this 
subtlety. " 



3.1 MATRICES 85 


3.1.3. Example. Let us find the matrix representation of the linear operator A e £(E 3 ), 
given by 


0 /x-y^r 2z\ 

in the basis 



There is a tendency to associate the matrix 

z 1 - 1 2 \ 

3 0-1} 

\0 2 1 / 


with the operator A. The following discussion will show that this is false. To obtain the first 
column of the matrix representing A, we consider 


A\a\) = A 


0=0-(i ： 


2 


1、 




So, by Box 3.1.1, the first column of the matrix is 


A. 

2 

1 

一 5 

5 

、 2 ‘ 


The other two columns are obtained from 



giving the second and the third columns, respectively. The whole matrix is then 



As long as all vectors are represented by columns whose entries are expansion coefficients 
of the vectors inB,A and A are indistinguishable. However, the action of A on the column 



3. MATRICES: OPERATOR REPRESENTATIONS 


vector (^) will not yield theRHS of Equation (3.6)! Although this is not usually emphasized, 
the column vector on the LHS of Equation (3.6) is really the vector 



which is an expansion in terms of the standard basis of E 3 rather than in terms of B. 
We can expand A (^) in terms of B, yielding 


0 /x-y-\- 2z\ 


(2x - \y) + 2z) ^oj + (x + |：y — z) 


This says that in the basis B this vector has the representation 

2x-\y 

=-x + 4 - 2z 

B x-i-^y-z 

Similarly, (!) is represented by 

=h x -\y-^h z - 

B \—4- ^z) 

Applying A to the RHS of (3.8) yields the RHS of (3.7), as it should. 





(3.7) 


(3.8) 

m 


、 Given any M x N matrix A, an operator T A € £i(Vjv, W^) can be associated 

with A, and one can construct the kernel and the range of T^. The rank of Ta 
rank of a matrix is called the rank of A. Since the rank of an operator is basis independent, this 
definition makes sense. 

Now suppose that we choose a basis for the kernel of T 4 and extend it to a basis 
of V. Let Vi denote the span of the remaining basis vectors. Similarly, we choose a 
basis for (V)and extend it to a basis for W. In these two bases, the M x JV matrix 
representing Ta will have all zeros except for an r x r submatrix, where r is the 
rank of T^. The reader may verify that this submatrix has a nonzero determinant. 
In fact, the submatrix represents the isomorphism between V\ and Ta(V), and, 
by its very construction, is the largest such matrix. Since the determinant of an 
operator is basis-independent, we have the following proposition: 

3.1.4. Proposition. The rank of a matrix is the dimension of the largest (square) 
submatrix whose determinant is not zero. 



3.2 OPERATIONS ON MATRICES 87 


transpose of a matrix 


symmetric and 
antisymmetric 
matrices 

orthogonal matrix 
complex conjugation 

hermitian conjugate 


hermitian and unitary 
matrices 


3.2 Operations on Matrices 

There are two basic operations that one can perform on a matrix to obtain a new 
one; these are transposition and complex conjugation. The transpose of an Af x # 
matrix A is an N x M matrix A ? obtained by interchanging the rows and columns 
of A: 

(A% = (A) 』 i ， or (aijY = (ap). (3.9) 

The following theorem, whose proof follows immediately from the definition 
of transpose, summarizes the important properties of the operation of transposition. 

3.2.1. Theorem. Let A and B be two (square) matrices. Then 

⑷ （A + B) f = A f + ， ⑻ （ AB/ = BW ， （ c) (A J ) f = A. 

Of special interest is a matrix that is identical to its transpose. Such matrices 
occur frequently in physics and are called symmetric matrices. Similarly, anti¬ 
symmetric matrices are those satisfying A f = 一 A. Any matrix A can be written 
as a = 去 (A + A f ) + !(A - 八 0, where the first term is symmetric and the second 
is antisymmetric. 

The elements of a symmetric matrix A satisfy the relation a# ：= (A f )iy == 
(A)ij = otij\ i:e” the matrix is symmetric under reflection through the main di¬ 
agonal. On the other hand, for an antisymmetric matrix we have 叫 = -ccij. In 
particular, the diagonal elements of an antisymmetric matrix are all zero. 

A (real) matrix satisfying A = AA f = 1 is called orthogonal. 

Complex conjugation is an operation under which all elements of a matrix 
are complex conjugated. Denoting the complex conjugate of A by A*, we have 
(A*)ij = (A)^, or = (a? ). A matrix is real if and only if A* = A. Clearly, 
(A*)* = A. lJ 

Under the combined operation of complex conjugation and transposition, the 
rows and columns of a matrix are interchanged and all of its elements are complex 
conjugated. This combined operation is called the adjoint operation, or hermitian 
coixjugation ， and is denoted by t，as with operators. Thus, we have 

= (A 1 )* = (A*) f , 

(A%, = (A ),， or (a") 1 * = (a* £ ). 

Two types of matrices are important enough to warrant a separate definition. 

3.2.2. Definition. A hermitian matrix H satisfies =： H, or, in terms of elements ， 
r]*j = A unitary matrix U satisfies U^U = UU^ = 1, or, in terms of elements. 

Remarks: It follows immediately from this definition that 
1. The diagonal elements of a hermitian matrix are real. 


88 3. MATRICES: OPERATOR REPRESENTATIONS 


diagonal matrices 


Pauli spin matrices 


2. The Ath column of 过 hermitian matrix is the complex conjugate of its kth row, 
and vice versa. 

3. A real hermitian matrix is symmetric. 

4. The rows of an " x iV unitary matrix, when considered as vectors in C N ， form 
an orthonormal set, as do the columns. 

5. A real unitary matrix is orthogonal. 

It is sometimes possible (and desirable) to transform a matrix into a form in 
which all of its off-diagonal elements are zero. Such a matrix is called a diagonal 
matrix. A diagonal matrix whose diagonal elements are is denoted by 

diag (入 1 ，人 2 ，*. • ，入 aO. 

3.2.3. Example. In this example, we derive a useful identity for functions of a diagonal 
matrix. Let D = diag ( 入 i ，入 2 , ■ ■ ■ ，入 n) be a diagonal matrix, and f(x) a function that has a 
Taylor series expansion f(x) = Ylk^=o a k x ^' The same function of D can be written as 


00 


00 


/(D) = [ a k D k = ^2 %[diag ( 入 1 ，入 2 ,…， ^n)] k = % diag ( 入今，入 ! ， … ，入会 ) 

/c=0 k=Q k=0 

/ 00 00 00 \ 

=diag ([ 外人 H a k x 2 .E akX n ) = dia g (/( 人 1) ， /( 入 2)，•. • ， /( 入 《)). 

\k=0 k=0 / 


In words, the function of a diagonal matrix is equal to a diagonal matrix whose entries are the 
same function of the corresponding entries of the original matrix. In the above derivation, 
we used the following obvious properties of diagonal matrices: 

adiag ( 入 1 ，入 2, ■ ■ ■ ，入 》) = diag(< 3 入 i ， a 入 2, … ， a 、)， 
diag (入 i ，入 2 , • • • ，入 + diag(ft>i ， w 2 ,..., w M ) = diag(^i +^1，, A. n + a) n ), 

.diag (入 1 ，入 2, .. • ，人 rt ). diag(6>i ， fi ； 2» • • ■ ， ^n) = diag (入 1 叫 ， • ■. ， X n (o n )， 1 

3.2A Example* (a) A prototypical symmetric matrix is that of the moment of in¬ 
ertia encountered in mechanics. The /7 th element of this matrix is defined as 三 
fff p(x\ f X 2 , x^)XiXj dV t where 々 is the /th Cartesian coordinate of a point in the dis¬ 
tribution of mass described by the volume density p(x\, It is clear that 1(j = / 力 •， 

or I = l r . The moment of inertia matrix can be represented as 


( ,11 ,12 ,13 、 

hi hi h3 1 - 

^3i h 2 hv 

It has six independent elements. 

(b) An example of an antisymmetric matrix is the electromagnetic field tensor given by 
F= ^3 0 -Bi E 2 ) 

—B 2 0 五 3 I . 

\ • 一 五1 一五 2 一五 3 0 / 

(c) Examples of heimitian matrices are the 2 x 2 Pauli spin matrices: 



3,3 ORTHONORMAL BASES 的 


ffl = G i )， ff2 = (i ~o) ( ff3 = (o ~°i)- 

(d) The most frequently encountered orthogonal matrices are rotations. One such matrix 
Euler angles represents the rotation of a 3-dimensional rigid body in terms of Euler angles and is used in 
mechanics. Attaching a coordinate system to the body, a general rotation can be decomposed 
into a rotation of angle (p about the z-axis, followed by a rotation of angle 0 about the new 
a:- axis, followed by a rotation of angle 少 about the new z-axis. We simply exhibit this matrix 
in terms of these angles and leave it to the reader to show that it is indeed orthogonal. 

cos 少 cos <p — sin \jf cos 沒 sin 炉 — cos 少 sin 沪 一 sin 少 cos 沒 cos (p 
sin 少 cos 沪 + cos 诊 cos 0 sin 沪 一 sini/f sin 炉 + cos 少 cos 0 cos 识 
sin 0 sin 妒 sin0cos 沪 

m 



sin i/r sin 9 
— cos^jr smO 
cos 6 


3.3 Orthonormal Bases 

Matrix representation is facilitated by choosing an orthonormal basis B = 
{I 句〉 }Hi. Th e matrix elements of an operator A can be found in such a basis 

by “multiplying” both sides of A 1 灼〉 =otki M on the left by {ej\: 

\ N 

Ciki M ] = y2 a ki = aji, 

=^jk 

or 


(ej\A\ei) = {ej\ 


Wij =<Xij — (ei\A\ej) . (3-11) 

We can also show that in an orthonormal basis, the ith component ^ of a 
vector is found by multiplying it by {et\. This expression for ^ allows us to write 
the expansion of \x) as 

w^ 1 = 1 ， （ 3 * 12 ) 

which is the same as in Proposition 2.5.3. Let us now investigate the representation 
of the special operators discussed in Chapter 2 and find the connection between 
those operators and the matrices encountered in the last section. We begin by 
calculating the matrix representing the hermitian conjugate of an operator T. In 
an orthonoimal basis, the elements of this matrix are given by Equation (3.11), 
Ti j = {et\r\ej). Taking the complex conjugate of this equation and using the 
definition ofl^ given in Equation (2.8), we obtain = (e/|T \ej)* =. {ej\7^ \ei) 9 
or (ft)" = r*. This is precisely how the adjoint of a matrix was defined. 


3. MATRICES ： OPERATOR REPRESENTATIONS 


Note how crucially this conclusion depends on the orthonormality of the basis 
vectors. If the basis were not orthonormal, we could not use Equation (3.11) on 
which the conclusion is based. Therefore, 

3»3*1. Box. Only in an orthonormal basis is the adjoint of an operator rep¬ 
resented by the adjoint of the matrix representing that operator. 

In particular, a hermitian operator is represented by a hermitian matrix only if 
an orthonormal basis is used. The following example illustrates this point. 

3.3.2. Example. Consider the matrix representation of the hermitian operator H in a 
general — not orthonormal — basis B = {\ai) The elements of the matrix corresponding 

to H are given by 

N N 

H|Ojt> = or H|^*> = ^rjji \aj). (3.13) 

J=1 J=1 

Taking the product of the first equation with (ai \ and complex-conjugating the result gives 
〈巧 I H1 处 >* = (I^^=i Vjk i a i\ a j)T = X^ =1 I But by the definition of a 

hermitian operator, 

( a i I H \ a k)* = (%l M = {ak\W\ai). 

So we have {a k \ H \a t ) = T!j = 1 n) k ( 〜 1 叫 ), 

On the other hand, multiplying the second equation in (3.13) by (a^\ gives 
{ak\ H rjji (ak\ aj). The only conclusion we can draw from this discussion is 

( a j\ a i) — ^ji (%l a j)- Because this equation does not say anything 

about each individual 叫 ， we cannot conclude, in general, that ijfj = r}j 卜 However, 
if the |fl|_〉’s are orthonormal, then = S：i and (ak\a；) = and we obtain 

= Z)7=l or ^ = rjki, as expected of a hermitian matrix. 圈 

Similarly, we expect the matrices representing unitary operators to be uni¬ 
tary only if the basis is orthonormal. This is an immediate consequence of 
Equation (3.10), but we shall prove it in order to provide yet another example 
of how the completeness relation, Equation (3.12), is used. Since UU^ = 1, 
we have (e/| UU 卞 \ej) = (e/|1 \ej) = 8”. We insert the completeness relation 
1 = Xlfci \ e k) (^k\ between U and on the LHS: 

( N \ N 

Y] \^k) {ek \) u f \ej) = Y] {ei\U \e k ) {ek\^ \ej) =8ij. 

*;l / k=i -- - ， 

This equation gives the first half of the requirement for a unitary matrix given in 
Definition 3.2.2. By redoing the calculation for U^U, we could obtain the second 
half of that requirement. 







3.4 CHANGE OF BASIS AND SIWIILARITY TRANSFORMATION 91 


basis transformation 
matrix 


similarity 

transformation 


3.4 Change of Basis and Similarity Transformation 

It is often advantageous to describe a physical problem in a particular basis because 
it takes a simpler form there, but the general form of the result may still be of 
importance. In such cases the problem is solved in one basis, and the result is 
transformed to other bases. Let us investigate this point in some detail. 

Given a basis B = we can write an arbitrary vector \a) with com¬ 

ponents {a\,a 2 ,..^oi N } in 5 as \a) = YliLi a i l«/)< Now suppose that we 
change the basis to B f = How are the components of \a) in B f re- 

lated to those in B1 To answer this question, we write \at) in terms of B r vectors, 
\at) = Yfj=\ Pji and substitute for \a{) in this expansion of |a>, obtaining 

\a) = \ a ， j) = Eij a iPji denote the Jth component 

of in 5' by a) ， then this equation tells us that 

N 

oc f j — y^pjiaj for j = 1,2 、 … ， N. (3.14) 


If we use R, and a, respectively, to designate a column vector with elements a), 
w N x N matrix with elements pji, and a column vector with elements then 
Equation (3.14) can be written in matrix form as 


/a[\ 


{ P\\ P12 … PIN 、 


〈 Ofl 、 

a 2 

— 

P 2 \ P 22 ... Pin 

w • ■ 


«2 

V 

■ 

■ 

Wn) 


■ • * 

\PN\ PN2 … PNN/ 


• 


or 


Ra. 


(3.15) 


The matrix R is called the basis transformation matrix. It is invertible because 
it is a linear transformation that maps one basis onto another (see Theorem 2.1.6). 

What happens to a matrix when we transform the basis? Consider the equation 
\b) = A |a), where \a) and \b) have components { 恥 }^^ and { 則匕 ” respectively ， 
in This equation has a corresponding matrix equation b = Aa. Now, if we 
change the basis, the components of |a> and \b) will change to those of a / and b ’， 
respectively. We seek a matrix A ; such that b’ = A / a / . This matrix will clearly be 
the transform of A. Using Equation (3,15)，we write Rb = A'Ra，or b = R _1 A / Ra, 
Comparing this with b = Aa and applying the fact that both equations hold for 
arbitrary a and b, we conclude that 


FT 1 A'R = A ， or A’ = RAFT 1 _ 


(3.16) 


This is called a similarity transformation on A，and A’ is said to be similar to A. 

The transformation matrix R can easily be found for orthonormal bases B = 
[\et)}f =1 and B 1 = {W^}^ We have \ei) = Ef=i Pki l<). Multiplying this 


• MATRICES ： OPERATOR REPRESENTATIONS 



,we obtain 


A T 

Pki (^j\e f k ) = puSjk = 


(3.17) 


3A1. Box. To find the i jth element of the matrix that changes the compo¬ 
nents of a vector in the orthonormal basis B to those of the same vector in 
the orthonormal basis B f ， take the jth ket in B and multiply it by the iih bra 
in 


To find the i jth element of the matrix that changes B f into B, we take the 
7 th ket in B f and multiply it by the /th bra in B\ = 〈灼 I <>• However, the 

matrix R / must be R _1 , as can be seen from Equation (3.15). On the other hand, 

iPijT = = Pji,ox 

or (R - 1 )/； = p；- = (R%. (3.18) 

This shows that R is a unitary matrix and yields an important result. 

3A2. Theorem. The matrix that transforms one orthonormal basis into another 
is necessarily unitary. 

From Equations (3.17) and (3.18) we have (R 卞 )" = (eile’j). Thus, 




3.4.3. Box. To obtain the jth column ofR^, we take the jth vector in the 
new basis and successively “multiply” it by (ei\ for i = 1,2,.,,, N. 


In particular, if the original basis is the standard basis of C N and \e r .) is rep- 

J 

resented by a column vector in that basis, then the jth column of R 卞 is simply the 
vector \ej). 

3.4.4. Example. In this example, we show that the similarity transform of a function 
of a matrix is the same function of the similarity transform of the matrix: R/(A)R -1 = 
f (RAR— 1 ). The proof involves inserting 1 = R _1 R between factors of A in the Taylor series 
expansion of /(A): 


R/(A)R. 


<s 


k times 


J R -1 =： 乞 o^RA^R -1 = 它 a^RAA* ■ ■ AFT 
} ^=0 ^=0 


k times 


~ — r l 

RAFT 1 RAFT 1 … RAFT 1 = ^^(RAR' 1 
k=Q k=0 


/(RAR— 1 ). h 







3.5 THE DETERMINANT 的 


3.5 The Determinant 

An important concept associated with linear operators is the determinant. Deter¬ 
minants are usually defined in terms of matrix representations of operators in a 
particular basis. This may give the impression that determinants are basis depen¬ 
dent. However, we shall show that the value of the determinant of an operator is the 
same in all bases. In fact, it is possible to define determinants of operators without 
resort to a specific representation of the operator in terms of matrices (see Section 
253.1). 

Let us first introduce a permutation symbol which will be used exten¬ 

sively in this chapter. It is defined by 

In other words,is completely antisymmetric (or skew-symmetric) under 
interchange of any pair of its indices. We will use this permutation symbol to define 
determinants. An immediate consequence of the definition above is that 灼口 _ 2 ..扣 
will be zero if any two of its indices are equal. Also note that 叫 2 」尺 is +1 if 
0*1, i2, • • ■ ， in) is an even permutation 1 (shuffling) of ( 1 ， 2 , ，. • ， N 、， and — 1 if it 
is an odd permutation of (1 ， 2 ,… ， AO. 

3.5.1 Determinant of a Matrix 

determinant defined 3.5.1, Definition. The determinant is a mapping, det : M iVxAr ^ C, given in 

term of the elements otij of a matrix A by 

N 

det A = > : ■ 

Definition 3.5.1 gives det A in terms of an expansion in rows, so the first entry 
is from the first row, the second from the second row, and so on. It is also possible 
to expand in terms of columns, as the following theorem shows. 

3.5.2. Theorem. The determinant of a matrix A can be written as 

N 

det A = 〉: 忘 ip‘2 … ■- - • 

’1，…， /jVT 

Therefore ， det A = det 


1 An even, permutation means an even number of exchanges of pairs of elements. Thus, (2, 3, 1) is an even permutation 
of (1, 2, 3), while (2,1, 3) is an odd permutation. It can be shown (see Chapter 23) that the parity (evenness or oddness) of 
a permutation is well-defined; i.e” that although there may be many routes of reaching a permutation from a given (fiducial) 
permutation via exchanges of pairs of elements, all such routes require either even numbers of exchanges or odd numbers of 
exchanges. 



94 3. MATRICES: OPERATOR REPRESENTATIONS 


Proof. We shall go into some detail in the proof of only this theorem on determi¬ 
nants to illustrate the manipulations involved in working with the e symbol. We 
shall not reproduce such details for the other theorems. 

In the equation of Definition 3.5.1 ， “" 2 , •. ■ ， are all different and form a 
permutation of (1, 2,, N), So one of the a’s must have 1 as its second index. 
We assume that it is the j\th term; that is, = 1, and 

^1*1 - • • - - ^Ni N — - - a hl - - - a Ni N 

We move this term all the way to the left to get ...a^. Now we look 

for the entry with 2 as the second index and assume that it occurs at the 72 th 
position; that is, ij 2 = 2. We move this to the left, next to and write 
(Xj { iaj 2 2 ^Ui - - . otNi N . We continue in this fashion until we get aj^a 力 2 •.. ctj N N, 
Since j\ ， J. 2 , … ， is really a reshuffling of i’i, 込 ， •. • ，〜， the summation indices 
can be changed to juji ， … ， jN, and we can write 2 

N 

detA = : Si\i2 … iNaj\ \ • ， • 

Jb—JiV=l 

If we can show that si x i 2 _i N ― Sj t j 2 _j N , we are done. In the equation of the 
theorem, the sequence of integers ( 4 , 12 , …， 〜）is obtained by some shuffling 
of (1,2,..., AO- What we have done just now is to reshuffle . •. ， in) in 
reverse order to get back to (1 ， 2, • • • ， iV) ‘ Thus, if the shuffling in the equation of 
the theorem is even (odd), the reshuffling will also be even (odd). Thus, £：/“ 2 …〜 = 
e hj 2 “.jN ，we obtain the first part of the theorem: 

N 

det A = ^2 s hj 2 ...m a hi - - - … 

h ， … ， jat=1 

For the second part, we simply note that the rows of A ( are columns of A and vice 
versa. □ 

3.5.3. Theorem* Interchanging two rows (or two columns) of a matrix changes 
the sign of its determinant. 

Proof. The proof is a simple exercise in permutation left for the reader. □ 
An immediate consequence of this theorem is the following corollary. 

3.5.4. Corollary. The determinant of a matrix with two equal rows (or two equal 
columns) is zero. Therefore，one can add a multiple of a row (column) to another 
row (column) of a matrix without changing its determinant. 


2 Thee symbol ia the sum is not independent of the j% although it appears without such indices. In reality, the 1 indices are 
“functions” of the j indices. 


3.5 THE DETERMINANT 95 


cofactor of an 
element of a matrix 


Since every term of the determinant in Definition 3.5.1 contains one and only 
one element from each row, we can write 


N 

det A = oiiiAn + a/ 2^/2 H - 1- a* 


where A” contains products of elements of the matrix A other than the element 
a”. Since each element of a row or column occurs at most once in each term of the 
expansion, A(j cannot contain any element from the/th row or the jth column. The 
quantity Aij is called the cofactor of otij, and the above expression is known as the 
(Laplace) expansion o/det A by its i th row. Clearly, there is a similar expansion by 
the /th column of the determinant, which is obtained by a similar argument using 
the equation of Theorem 3.5.2. We collect both results in the following equation. 


N N 

det A = a ijMj — ajiAji, (3.20) 



Vandermonde, Alexandre-Thieophile, also known as Alexis, Abnit, and Charles-Auguste 
Vandermonde (1735-1796) had a father，a physician who directed his sickly son toward a 
musical career. An acquaintanceship with Fontaine, however, so stimulated Vandermonde 
that in 1771 he was elected to the Academie des Sciences, to which he presented four 
mathematical papers (his total mathematical production) in 1771 — 1772. Later, Vandermonde 
wrote several papers on harmony, and it was said at that time that musicians considered 
Vandermonde to be a mathematician and that mathematicians viewed him as a musician. 

Vandermonde’s membership in the Academy led to a paper on experiments with cold, 
made with Bezout and Lavoisier in 1776, and a paper on the manufacture of steel with 
Berthollet and Monge in 1786. Vandermonde became an ardent and active revolutionary, 
being such a close friend of Monge that he was termed “femme de Monge.” He was a 
member of the Commune of Paris and the club of the Jacobins. In 1782 he was director of 
the Conservatoire des Arts et Metiers and in 1792, chief of the Bureau de l’Habillement 
des Armies. He joined in the design of a course in political economy for the ficole Normale 
and in 1795 was named a member of the Institut National. 

Vandermonde is best known for the theory of determinants. Lebesgue believed that 
the attribution of determinant to Vandermonde was due to a misreading of his notation. 
Nevertheless, Vandermonde’s fourth paper was the first to give a connected exposition of 
determinants, because he (1) defined a contemporary symbolism that was more complete, 
simple, and appropriate than that of Leibniz; (2) defined determinants as functions apart 
from the solution of linear equations presented by Cramer but also treated by Vandemionde; 
and (3) gave a number of properties of these functions, such as the number and signs of the 
terms and the effect of interchanging two consecutive indices (rows or columns), which he 
used to show that a determinant is zero if two rows or columns are identical. 

Vandermonde’s real and unrecognized claim to fame was lodged in his first paper, in 
which he approached the general problem of the solvability of algebraic equations through 
a study of ftmctions invariant under permutations of the roots of the equations. Cauchy 




96 3, MATRICES ： OPERATOR REPRESENTATIONS 


minor of a matrix 


assigned priority in this to Lagrange and Vandermonde. Vandermonde read his paper in 
November 1770, but he did not become a member of the Academy until 1771, and the 
paper was not published until 1774. Although Vandermonde's methods were close to those 
later developed by Abel and Galois for testing the solvability of equations, and although his 
treatment of the binomial equation — 1 == 0 could easily have led to the anticipation 
of Gauss’s results on constructible polygons, Vandermonde himself did not rigorously or 
completely establish his results, nor did he see the implications for geometry. Nevertheless, 
Kronecker dates the modern movement in algebra to Vandermonde *s 1770 paper. 

Unfortunately, Vandermonde^ spurt of enthusiasm and creativity, which in two years 
produced four insightful mathematical papers at least two of which were of substantial 
importance, was quickly diverted by the exciting politics of the time and perhaps by poor 
health. 


3.5.5. Proposition. If i ^k t then otijA^j = 0 = 

Proof. Consider the matrix B obtained from A by replacing row k fay row i (row 
i remains unchanged, of course). The matrix B has two equal rows, and its de¬ 
terminant is therefore zero. Now, if we expand det B by its 灸 th row according to 
Equation (3.20), we obtain 0 = detB = 5^j=i But the elements of the 

row of B are the elements of the ith row of A; that is, 知 =Qf"，and the cofactors 
of the kth row of B are the same as those of A, that is, Bkj = Akj. Thus, the first 
equation of the proposition is established. The second equation can be established 
using expansion by columns. □ 

A minor of order N — 1 of an N x N matrix A is the determinant of a matrix 
obtained by striking out one row and one column of A. If we strike out the ith row 
and jth column of A, then the minor is denoted by M”. 

3.5.6. Theorem. Afj = (— 

Proof. The proof involves separating an from the rest of the terms in the expansion 
of the determinant. The unique coefficient of a\\ is An by Equation (3.20). We 
can show that it is also M\\ by examining the £ expansion of the determinant and 
performing the first sum. This will establish the equality An = M\\. The general 
equality is obtained by performing enough interchanges of rows and columns of 
the matrix to bring o:" into the first-row first-column position, each exchange 
introducing a negative sign，thus the (—factor. The details are left as an 
exercise. □ 

The combination of Equation (3.20) and Theorem 3.5.6 gives the familiar 
routine of evaluating the determinant of a matrix. 

3.5.2 Determinants of Products of Matrices 

One extremely useM property of determinants is expressed in the following the¬ 
orem. 




3.5 THE DETERMINANT 


3.5.7. Theorem. det(AB) = (det A)(det B) 

Proof. The proof consists in keeping track of index shuffling while rearranging the 
order of products of matrix elements. We shall leave the details as an exercise. □ 

3.5.8. Example. Let O and U denote, respectively, an orthogonal and a unitary n x n 
matrix; that is ， OO r = O l O = 1, and UU^ = U^U = 1. Taking the determinant of the 
first equation and using Theorems 3.5.2 and 3.5.7, we obtain (det 0)(detO f ) = (detO) 2 = 
det 1 = 1. Therefore, for an orthogonal matrix, we get detO = =bl. 

Orthogonal transformations preserve a real inner product. Among such transformations 
are the so-called inversions, which, in their simplest form, multiply a vector by — l.In three 
dimensions this corresponds to a reflection through the origin. The matrix associated with 
this operation is —1: 

普 G 1 ■ 

which has a determinant of — 1. This is a prototype of other，more complicated, orthogonal 
transformations whose determinants are —1. 

The other orthogonal transformations, whose determinants are +1, are of special in¬ 
terest because they correspond to rotations in three dimensions. The set of orthogonal 
transfoimations in n dimensions having the determinant +1 is denoted by SO(n). These 
transformations are special because they have the mathematical structure of a (continuous) 
group, which finds application in many areas of advanced physics. We shall come back to 
the topic of group theory later in the book. 

We can obtain a similar result for unitary transformations. We take the determinant of 
both sides of 1 )卞 U = 1: 

det(U*) f det U = det U* det U = (det U)*(detU) = | det U| 2 = 1. 

Thus, we can generally write detll = e l<x t with a e E. The set of those transformations 
with a = 0 forms a group to which 1 belongs and that is denoted by SU(n). This group has 
found applications in the attempts at unifying the fundamental interactions. 麵 

3.5.3 Inverse of a Matrix 

One of the most useful properties of the determinant is the simple criterion it 
gives for a matrix to be invertible. We are ready to investigate this criterion now. 
We combine Equation (3.20) and the content of Proposition 3.5.5 into a single 
equation, 

N N 

y^. a U A kj = (detA)fi^ = y^ajjAjk, 




(3.21) 




98 3. MATRICES ： OPERATOR REPRESENTATIONS 


inverse of a matrix 


and construct a matrix C^, whose elements are the cofactors of the elements of the 
matrix A: 


^ A\\ An … 

Mi Mi • - - Mn 

Oa = . . 

9 » • 

\^ni ^N2 • • ■ Ann/ 

Then Equation (3.21) can be written as 


(3.22) 


N N 

y^. a ij ^A^jk = (detA)<5/^ = E i^A)kj a J i > 

7=1 7=1 


or, in matrix form, as 
AC^ = (det A)1 =C^A. 


(3.23) 


3.5.9. Theorem. The inverse of a matrix (if it exists) is unique. The matrix A has 
an inverse if and only if det A ^ 0. Furthermore, 

A -1 = 

where Ca is the matrix of the cofactors of A. 

Proof. Let B and C be inverses of A. Then 


C A 
det A 


(3.24) 


B = C (AB) = C. 

For the second part, we note that if A has an inverse B, then 
AB = 1 => detAdetB = det 1 = 1, 


B = (CA) 



whence det A ^ 0. Conversely, if det A _ 0, then dividing both sides of Equation 
(3.23) by det A, we obtain the unique inverse (3.24) of A. □ 

The inverse of a 2 x 2 matrix is easily found: 


( a ^ _1 = 1 ( d ~ b \ 
\c d) ad — be V—c a ) 


(3.25) 


]fad—bc ^ 0. There is a more practical way of calculating the inverse of matrices. 
In the following discussion of this method, we shall confine ourselves simply to 
stating a couple of definitions and the main theorem, with no attempt at providing 
any proofs. The practical utility of the method will be illustrated by a detailed 
analysis of examples. 



3.5 THE DETERMINANT 


elementary row 3.5.10. Definition. An elementary row operation on a matrix is one of the fol- 
operation lowing: (a) interchange of two rows of the matrix, (b) multiplication of a row by a 
nonzero number, and (c) addition of a multiple of one row to another. 

Elementary column operations are defined analogously. 

triangular, or 3.5.11. Definition. A matrix is in triangular, or row-echelon，form if it satisfies 

row-echelon form of the following three conditions: 
a matrix 

1. Any row consisting of only zeros is below any row that contains at least one 
nonzero element. 

2. Going from left to right, the first nonzero entry of any row is to the left of the 
first nonzero entry of any lower row. 

3. The first nonzero entry of each row is 1. 

3.5.12. Theorem. For any invertible n xn matrix A, 

• The n x 2n matrix (A|1) can be transformed into the nx2n matrix (1 |A _1 ) 
by means of a finite number of elementary row operations? 

9 //"(All) is transformed into (1|B) by means of elementary row operations ， 
then B = A - 1 

A systematic way of transforming (A[1) into (1 |A -1 ) is first to bring A into 
triangular form and then eliminate all nonzero elements of each column by ele¬ 
mentary row operations. 

3.5.13. Example. Let us evaluate the inverse of 



We start with 


/I 2 -1 1 0 0 \ 

- y (01-2 0 1 0 } 

- 士 ⑶ 0 12/5 -3/5 -1/5/ 

3 The matrix (A|1) denotes the n x2n matrix obtained by juxtaposing the n x n unit matrix to the right of A. It can easily be 
shown that if A, B, and C are h x « matrices, then A(B|C) = (AB|AC). 


and apply elementary row operations to M to bring the left half of it into triangular form. If 
we denote the fcth row by (k) and the three operations of Definition 3.5.10, respectively, by 
⑻分 （ j )， of(fe), and a(k) + (j), we get 

/I 2 -1 1 0 0\ /I 2 -1 1 0 0 、 

M - ^ (0 1 -2 0 1 0 I - ^ j 0 1 -2 0 1 0 ] 

-2(1 )+ ⑶ \o _3 1 -2 0 1/ 3(2)+(3) \0 0 -5 -2 3 1 / 


loo 

12 1 
III 




.MATRICES: OPERATOR REPRESENTATIONS 


/ 1 -2 10 1 0 \ 
- ^ ( 0 3 1 1 -2 0 1 - > 

-2 ⑴ +(2) V_! 5 o 0 0 1/ (_) 


/I -2 1 0 1 0\ /I -2 1 0 1 0\ 

- ^ f0 3 1 1 -2 0 I —— (0 1 1/3 1/3 -2/3 0 | 

-(2)+(3) \o o 0 -1 3 1/ iC 2 ) \0 0 0 -1 3 1/ 

The matrix B is now in triangular form, but its third row contains all zeros. There is no 
way we can bring this into the form of a unit matrix. We therefore conclude that B is not 
invertible. This is, of course, obvious，since it can easily be verified that B has a vanishing 
determinant. S 

We mentioned earlier that the determinant is a property of linear transforma¬ 
tions, although they are defined in terms of matrices that represent them. We can 
now show this. First, note that taking the determinant of both sides of AA— 1 = 1 ， 


The left half of M is in triangular form. However, we want all entries above any 1 in a 
column to be zero as well, i.e.，we want the left-hand matrix to be 1. We can do this by 
appropriate use of type 3 elementary row operations: 

/I 0 3 1 -2 0 \ 

- y f 0 1 -2 0 1 0 I 

-2(2)+(l) \ 0 Q i 2/5 -3/5 -1/5 / 


/I 0 0 -1/5 —1/5 3/5 \ 

- > (0 1-2 0 1 0 I 

-3 ⑶ +(1) \ 0 o i 2/5 -3/5 -1/5 / 


/I 0 0 -1/5 —1/5 3/5 \ 

- ^[o 1 0 4/5 一 1/5 -2/5 I 

2 ⑶ +(2) o i 2/5 -3/5 -1/5 / 

The right half of the resulting matrix is A -1 . 圏 

3.5.14. Example. It is instructive to start with a matrix that is not invertible and show 
that it is impossible to turn it into 1 by elementary row operations. Consider the matrix 

/ 2 — 1 

l 2 

Let us systematically bring it into triangular form: 

/ 2 -1 3 1 0 0\ / 1 -2 1 0 1 0\ 

( 1 - 21010 ] - ^ ( 2 - 13100 ) 

\-l 5 0 0 0 1/ ⑴ o(2) \—1 5 0 0 0 1/ 






3.6 THE TRACE 101 


one obtains det(A—” = 1/ det A. Now recall that the representations of an opera¬ 
tor in two different bases are related via a similarity transformation. Thus, if A is 
represented by A in one basis and by A ; in another, then there exists an invertible 
matrix R such that = RAR -1 . Taking the determinant of both sides, we get 

1 

det A 7 = det R det A - = det A. 

detR 

Thus, the determinant is an intrinsic property of the operator, independent of the 
basis chosen in which to represent the operator. 


3.6 The Trace 


Another intrinsic quantity associated with an operator that is usually defined in 
terms of matrices is given in the following definition. 

3.6.1. Definition. Let kbeanN x N matrix. The mapping tr : M NxN C (or 
trace of a square R) given by tr A = an is called the trace of A. 

matrix 

3.6.2. Theorem. The trace is a linear mapping. Furthermore, 

tr = tr A and tr(AB) = tr(BA). 

Proof. The linearity of the trace and the first identity follow directly from the 
definition. To prove the second identity of the theorem, we use the definitions of 
the trace and the matrix product: 

N N N N N 

tr(AB) = y^(AB)u = (/⑹ y = W^)."(A)/j 

i=l • i=l j=l i=l j=l 

N 

；=i 

□ 


’ JV \ N 

E ⑻力 ‘(A )" 卜 = tr(BA). 

/ J=l 


connection between 
trace and 
determinant 


3.6.3. Example. In this example, we show a very useful connection between the trace 
and the determinant that holds when a matrix is only infinitesimally different from the unit 
matrix. Let us calculate the determinant of 1 + 6A to first ordering. Using the definition of 


determinant, we write 

n 

det(1 4-6A) = [ + ⑽叫 ）… + €a ni n ) 


n n n 

占 l/i … ^ Xv - - m ^kik - - • ^nin^kik* 

*1 =1 A；=l 




102 3. MATRICES: OPERATOR REPRESENTATIONS 


The first sum is just the product of all the Kronecker deltas. In the second sum，means 
that in the product of the deltas, 4“ is absent. This tennis obtained by multiplying the 
second term of the kth parentheses by the first term of all the rest. Since we are interested 
only in the first power of e, we stop at this term. Now，the first sum is reduced to €\2...n = 1 
after all the Kronecker deltas are summed over. For the second sum, we get 

n n n n 

€^2 … 宫 kifc … 各 ni„au k =6^ € l2..J k ...n^ki k 

k=lii„^i n =il k^l ik=l 

n n 

= e X! €\2Hkk = = 6trA, (326) 

k=l k^l 

where the last line follows from the fact that the only nonzero value for 6 i 2 .. 4 „. n is obtained 
when ifc is equal to the missing index, i.e., k, in which case it will be 1. Thus det(1 + eA)= 
1+etrA. 圏 

Similar matrices have the same trace: If A 7 = RAR 一 1 ， then 

tr A’ = t^RAR— 1 ) = tr[R(AR _1 )3 = tr[(AR _ 1 )R] 

=tr[A(R - 1 R)] = tr(A1) = tr A. 

The preceding discussion is summarized in the following proposition. 

3.6.4. Proposition^ To every operator A e L(V) are associated two intrinsic 
numbers ，det A and tr A, which are the determinant and trace of the matrix repre¬ 
sentation of the operator in any basis ofV. 

It follows from this proposition that the result of Example 3.6.3 can be written 
in terms of operators: 

det(1 H- ^A) = 1 -|- 6 tr A. (3.27) 


A particularly useful formula that can be derived from this equation is the derivative 
at / = 0 of an operator A(/) depending on a single variable with the property 
that A(0) = 1 . To first order in t, we can write A ⑴ = 1 + ^A(0) where a dot 
represents differentiating with respect to t. Substituting this in Equation (3.27) 
and differentiating with respect to t, we obtain the important result 

— det(A(0) =trA(0 ). (3.28) 

at r =0 

3,6.5. Example* We have seen that the determinant of a product of matrices is the product 
of the determinants. On the other hand, the trace of a sum of matrices is the sum of traces. 
When dealing with numbers, products and sums are related via the logarithm and expo¬ 
nential: ap = expflna + In ^}. A generalization of this relation exists for diagonalizable 
matrices. Let Abe such a matrix, Le., let D = RAR -1 for some similarity transformation R 
and some diagonal matrix D = diag ( 入 i ，入 2, • • • ，又 n). Th e determinant of a diagonal matrix 
is simply the product of its elements: 


det D = 入 i 久 2 … 入 n- 


3.7 PROBLEMS 1 的 


Taking the natural log of both sides and using the result of Example 3.2.3, we have 

ln(det D) = In 入 i +ln 入 2 + ■■■ + In、= tr(ln D), 

which can also be written as det D = exp{tr(in D)}. 

In terms of A, this reads det(RAR 一 1 ) = exp{tr(ln(RAR -1 ))} ■ Now invoke the invariance 
of determinant and trace under similarity transformation and the result of Example 3.4.4 to 
obtain 


det A = exp{tr(R(lnA)R~ 1 )} = exp{tr(ln A)}. 


(3.29) 


This is an important equation, which is sometimes used to define the determinant of operators 
in infinite-dimensional vector spaces. 圈 

Both the determinant and the trace are mappings from , M, NxN to C. The deter¬ 
minant is not a linear mapping, but the trace is; and this opens up the possibility 
of defining an inner product in the vector space ofNxN matrices in terms of the 
trace: 

3.6*6. Proposition. For any two matrices A, B € M NxN , the mapping g : 
x yi/[ NxN C defined by B) = tr(A^B) is a sesquilinei 

product. 


ear inner 


Proof. The proof follows directly from the linearity of trace and the definition of 
hermitian conjugate. □ 


3.7 Problems 

3.1. Show that if |c) = \a) -f- |Z?), then in any basis the components of |c) are equal 
to the sums of the corresponding components of \a) and \b). Also show that the 
elements of the matrix representing the sum of two operators are the sums of the 
elements of the matrices representing those two operators. 

3.2. Show that the unit operator 1 is represented by the unit matrix in any basis. 
33, The linear operator A : R 3 3R 2 is given by 




Construct the matrix representing A in the standard bases of E 3 and R 2 , 
3.4. The linear transformation T : R 3 E 3 is defined as 

T(xi,X 2 , X3) = (^1 +X 2 -X 3 , 2xi — X3, Xi + 2X2). 

Find the matrix representation of T in 



104 3, MATRICES: OPERATOR REPRESENTATIONS 


(a) the standard basis of M 3 , 

(b) the basis consisting of \a\) = ( 1 , 1 , 0 ), |fl 2 ) = ( 1 , 0 , — 1 ), and \a^) — 
(0,2,3). 

3.5. Show that the diagonal elements of an antisymmetric matrix are all zero. 

3.6. Show that the number of independent real parameters formNxN 

1. (real) symmetric matrix is N(N + 1)/2, 

2. (real) antisymmetric matrix is N(N — 1)/2, 

3. (real) orthogonal matrix is N(N — 1)/2, 

4. (complex) unitary matrix is N 2 , 

5. (complex) hermitian matrix is N 2 , 

3.7. Show that an arbitrary orthogonal 2x2 matrix can be written in one of the 

/cosO sinB \ 

\sin0 — cos 9/ ' 

The first is a pure rotation (its determinant is +1), and the second has determinant 
_1, The form of the choices is dictated by the assumption that the first entry of 
the matrix reduces to 1 when ^ = 0 . 

3.8. Derive the formulas 

cos (0i + 沒 2 ) = cos 沒 1 cos ^2 — sin^i sin 02 » 
sin (0i + 62 ) — sin^i cos 办 + cos sin 办 

by noting that the rotation of the angle 0 i +62111 the xj-plane is the product of 
two rotations. (See Problem 3.7.) 

3.9. Prove that if a matrix M satisfies = 0, then M = 0 . Note that in general, 
M 2 = 0 does not imply that M is zero. Find a nonzero 2x2 matrix whose square 
is zero. 

3.10* Construct the matrix representations of 

D: V c 4 [t] 项 ] and T: 9 c 3 [t] 9 c 4 [t], 

the derivative and multiplication-by-f operators. Choose {1, t, / 2 , r 3 } as your basis 
of g [/] and {1, t, t 2 , t 3 , t A ) as your basis of ^[t]. Use the matrix of Dso obtained 
to find the first, second, third, fourth, and fifth derivatives of a general polynomial 
of degree 4. 


following two forms: 

/cosO — sin^\ - 
vsin^ cos^ / 


3.7 PROBLEMS 105 


3.11. Find the transformation matrix R that relates the (orthonormal) standard 
basis of C 3 to the orthonormal basis obtained from the following vectors via the 
Gram-Schmidt process: 

l«3> = 




Verify that R is unitary, as expected from Theorem 3.4.2. 


3.12. If the matrix representation of an endomorphism T of C 2 with respect to the 
standard basis is (} }), what is its matrix representation with respect to the basis 

{(!)，（—、)}? 


3.13. If the matrix representation of an endomorphism T of C 3 with respect to the 
standard basis is 


0 1 1 \ 
1 0 -lj 

— 1—1 0 / 


what is the representation of T with respect to the basis 

{(],)-(- ； o.(tm 

3.14. Using Definition 3.5.1, calculate the determinant of a general 3x3 matrix 
and obtain the familiar expansion of such a determinant in terms of the first row 
of the matrix. 


3.15. Prove Corollary 3.5.4. 

3.16. Show that det(otA) == det A for an iV x W matrix A and a complex number 

Of. 


3.17. Show that det 1 = 1 for any unit matrix. 

3.18. Find a specific pair of matrices A and B such that det (A + B) ^ det A+detB. 
Therefore, the determinant is not a, linear mapping. Hint: Any pair of matrices will 
most likely work. In fact, the challenge is to find a pair such that det(A + B)= 
det A + det B. 

3*19* Demonstrate Proposition 3.5.5 using an arbitrary 3x3 matrix and evaluating 
the sum explicitly. 

3.20. Find the inverse of the matrix 

/ 3 - 1 2 \ 

A = 1 0 —3 . 

\-2 1 - 1 / 



3.24. Find the matrix that transforms the standard basis of C 3 to the vectors 


忐 ，㈤ 

lii 
V6 / 


忐 ’ 1叱〉 

1 

^ \/6 f 


Show that this matrix is 


3,25. Consider the three operators Li, L 2 , and L 3 satisfying [Li，l_ 2 ] = JL 3 , 
[L 3 , Li] = /L 2 , [L 2 , L 3 ] = iL\. Show that the trace of each of these operators 
is necessarily zero. 

3»26. Show that in the expansion of the determinant given in Definition 3.5.1, no 
two elements of the same row or the same column can appear in each term of the 


3.27. Find inverses for the following matrices using both methods discussed in 
this chapter. 


， 1 1-1、 
2 1 2 
、 _ 1 一 2 一 2 / 


a 2 -i\ 
01 - 2 , c 
a 1 - 1 / 


1 /V 2 

0 

(1 - 0/(2V2) 

0 

1 /V 2 

(1 - 0/(2^) 

1/V2 

0 

-(l-0/(2v^) 

0 

l/y/2 

-(1 - 0/(2^) 


[ 1 
1 一％ 


106 3. MATRICES: OPERATOR REPRESENTATIONS_ 

3*21. Show explicitly that det(AB) = det A det B for 2 x 2 matrices. 

3.22. Given three N xN matrices A, B, and C such that AB = C with C invertible, 
show that both A and B must be invertible. Thus, any two operators A and B on 
a^wzYe-dimensional vector space satisfying AB = 1 are invertible and each is the 
inverse of the other. Note: This is not true for infinite-dimensional vector spaces. 

3.23. Show directly that the similarity transformation induced by R does not 
change the determinant or the trace of A where 


2 1 
I i 

3 

11 


ti 


10 2 

//l\ 


3,28_ Let A be an operator on V. Show that if det A = 0, then there exists a nonzero 
vector |x> € V such that A \x) = 0. 


3.7 PROBLEMS 107 


3*30. Let {a/Jjij, be the set consisting of the N rows of an iV x AT matrix A and 
assume that the are orthogonal to each other. Show that 

|detA|-||ai|| ||a 2 | 卜 •||a JV ||. 

Hint: Consider What would the result be if A were a unitary matrix? 

331, Prove that a set of n homogeneous linear equations in n unknowns has a 
nontrivial solution if and only if the determinant of the matrix of coefficients is 
zero. 

3.32. Use determinants to show that an antisymmetric matrix whose dimension is 
odd cannot have an inverse. 

3.33. Show that tr(|o) (Z?|) = (b\a). Hint: Evaluate the trace in an orthonormal 
basis. 

3.34. Show that if two invertible N x N matrices A and B anticommute (that is, 
AB + BA = 0), then ⑻ W must be even, and (b) tr A = trB = 0. 

3.35. Show that for a spatial rotation Rfi(0) of an angled about an arbitrary axis 
h,txRfi(0) = 1 +2cos0. 

3.36. Express the sum of the squares of elements of a matrix as a trace. Show that 
this sum is invariant under an orthogonal transformation of the matrix. 

3.37. Lets and A be a symmetric and an antisymmetric matrix, respectively, and 
let M be a general matrix. Show that 

⑻ trM = trM f , 

(b) tr(SA) = 0; in particular, tr A = 0, 

(c) SA is antisymmetric if and only if [S, A] = 0, 

(d) MSM f is symmetric and MAM’ is antisymmetric, 

(e) MHM 1 " is hermitian if H is. 


3.29. For which values of a are the following matrices invertible? Find the inverses 
whenever possible. 




MATRICES: OPERATOR REPRESENTATIONS 

3.38. Find the trace of each of the following linear operators: 

(a) T : R 3 -> R 3 given by 

T(x, y,z) = {x -\-y - z,2x + - 2z, x - y). 

(b) T : E 3 ^ M 3 given by 

T(x ， y,z)-(y-z f x-\-2y-\-z i z- j). 

(c) T : C 4 ^ C 4 given by 

T(^:, y 9 z,iv) = (x -\-iy - z + iw, 2ix +3y -2iz - w,x - iy, z + iw), 
3.39* Use Equation (3.29) to derive Equation (3.27). 

3.40. Suppose that there are two operators A and B such that [A, B] = cl, where 
c is a constant. Show that the vector space in which such operators are defined 
cannot be finite-dimensional. Conclude that the position and momentum operators 
of quantum mechanics can be defined only in infinite dimensions. 


Additional Reading 

1. Axler, S. Linear Algebra Done Right ， Springer-Verlag, 1996. 

2. Birkhoff, G. and MacLane, S. Modern Algebra, 4th ed., Macmillan, 1977. 
Discusses matrices from the standpoint of row and column operations. 


3. Greub, W. Linear Algebra, 4th ed., Springer-Verlag ， 1975. 


4 _ 

Spectral Decomposition 


The last chapter discussed matrix representation of operators. It was pointed out 
there that such a representation is basis-dependent. In some bases，the operator may 
“look” quite complicated, while in others it may take a simple form. In a "spe- 
cial” basis, the operator may look the simplest: It may be a diagonal matrix. This 
chapter investigates conditions under which a basis exists in which the operator is 
represented by a diagonal matrix. 

4.1 Direct Sums 


Sum of two 
subspaces defined 


4*1.1* Example. Let U be the ^y-plane and "W the >?z-plane. These are both subspaces 
of R 3 , and so is U + W. In fact, U + W — R 3 , because given any vector (x f y t z) inM 3 , 
we can write it as 

(x, y, z) = (x, 0) + (0, z ). 


Sometimes it is possible, and convenient, to break up a vector space into special 
(disjoint) subspaces. For instance, it is convenient to decompose the motion of a 
projectile into its horizontal and vertical components. Similarly, the study of the 
motion of a particle in R 3 under the influence of a central force field is facili¬ 
tated by decomposing the motion into its projections onto the direction of angular 
momentum and onto a plane perpendicular to the angular momentum. This corre¬ 
sponds to decomposing a vector in space into a vector, say, in the : cy-plane and a 
vector along the 2 -axis. We can generalize this to any vector space, but first some 
notation: Let U and W be subspaces of a vector space V. Denote by IX -f- W the 
collection of all vectors in V that can be written as a sum of two vectors, one in U 
and one in W. It is easy to show that U + W is a subspace of V. 


eU 


eW 



110 4. SPECTRAL DECOMPOSITION 


direct sum defined 


uniqueness of direct 
sum 


dimensions in a 
direct sum 


This decomposition is not unique: We could also write 0, y, z) = (x t ^,0) + (0, \y,z), 
and a host of other relations. ■ 

4.1.2. Definition. LetU and W be subspaces of a vector space V such that V = 
U + W and the only vector common to bothU and W is the zero vector. Then we 
say that V is the direct sum ofU andW and write 

v = uew. 

4.1«3. Proposition. LetUand'W be subspaces ofV. Then V = 1£ ㊉ W if and only 
if any vector in V can be written uniquely as a vector in U plus a vector in W. 

Proof. Assume V = U © W, and let |u) e V be written as a sum of a vector in U 
and a vector in W in two different ways'. 

|u) = \u) -|- |w;) = \u ! ) + \w f ) O \u) — \u f ) = \w ! ) - |w;). 

， 

The LHS is in It. Since it is equal to the RHS — which is in W — it must be in 
W as well. Therefore, the LHS must equal zero, as must the RHS. Thus, |m> = 
|m’> ， \w f ) = | 出 〉， and there is only one way that |u> can be written as a sum of a 
vector in 11 and a vector in W. 

Conversely, if |a) E U and also \a) e W, then one can write 


\a) = \a) + |0> 



and 


= | 0 ) + \ a )- 



in It inW in It in W 

Uniqueness of the decomposition of \a) implies that \a) = |0). Therefore, the 
only vector common to both IX and W is the zero vector. This implies that V = 


U0W. 


□ 


4.1.4. Proposition. IfV = U ㊉ W, then dimV = dim U 4- dim W. 

Proof. Let {be a basis for l£ and {lit?/)}*_ 1 a basis fox W. Then it is easily 
verified that {|mi) ， \in) ， ■ • ■ ， \u m ) , |u ； i) , |u ； 2 > ， ... ， |wfc〉} is a basis for V. The 
details are left as an exercise. □ 


We can generalize the notion of the direct sum to more than two subspaces. For 
example, we can write R 3 = X ㊉ 衫 ㊉ Z ， where X, y ， and Z are the one-dimensional 
subspaces corresponding to the three axes. Now assume that 


V = Hi ㊉ U 2 ㊉…㊉ TJr ， (4.1) 

i.e., V is the direct sum of r of its subspaces that have no common vectors among 
themselves except the zero vector and have the property that any vector in V can be 
written (uniquely) as a sum of vectors one from each subspace. Define the linear 
operator Pj by Pj \u) = \uj) where |m) = Yl r j-i 1 M )〉， \ u j) € Itj . Then it is readily 


4,1 DIRECTSUMS 111 


verified 

hermiti 


I that Pj = P ; and P ; P fe = 0 for j ^ L Thus，the P/sare (not necessarily 
an) projection operators. Furthermore, for an arbitrary vector \u) t we have 


W = 乞 j = 乞 P» = (lZ p i ) M V€ V. 


Since this is true for all vectors，we have the identity 

J2 P J = V ( 4 . 2 ) 

Definition. Let V be an inner product space. Let be any subspace ofV. 

orthogonal Denote by 加丄 汍 e 5 衫 of all vectors in V orthogonal to all the vectors in M- >[丄 

complement of a (pronounced “em perp”} is called the orthogonal complement ojfM. 
subspace 

4.1.6. Proposition. M- 1 is a subspace ofV. 

Proof. In fact, if \a ), \b) e M 丄 ， then for any vector \c) e M, we have 



(c| (a |a) + p \b)) = a{c\a)+fi{c\b) = 0. 

So, a\a) + ^ \b) e M 丄 for arbitrary a, ^ e C and \a ), \b) e M 1 . □ 

If V of Equation (4.1) is an inner product space, and the subspaces are mutually 
orthogonal, then for arbitrary \u ), |t?> e V, 


<m|Pj l^> = {w| Vj) = (Uj\Vj) = {Vj\ujf = = (l ； |p ; - |w)* 

which shows that P 7 * is hermitian.In Chapter 2, we assumed the projection operators 
to be hermitian. Now we see that only in an inner product space (and only if 
the subspaces of a direct sum are orthogonal) do we recover the herraiticity of 
projection operators. 

4.1.7. Example. Consider an orthonormal basis for and extend it 

to a basis B = {k /)}^_2 ^ or Now construct a (hermitian) projection operator P = 
\ e i) ( e i\- This is the operator that projects an arbitrary vector in V onto the subspace 
M. It is straightforward to show that 1 — Pis the projection operator that projects onto M 1 - 
(see Problem 4.2). 

An arbitrary vector \a) e V can be written as 
|a> = (P +1 - P) \a) =P|n) + (1-P)|a>. 

' - V - ' 

inM iu ： M 丄 

Furthermore, the only vector that can be in both M and is the zero vector, because it is 

the only vector orthogonal to itself. S 


4. SPECTRAL DECOMPOSITION 


From this example and the remarks immediately preceding it we may conclude 
the following: 

4.1.8. Proposition. If V is an inner product space, then V = M 0 /or 
subspace M. Furthermore，the projection operators corresponding to M and 3Vt 丄 
are hermitian, 

4.2 Invariant Subspaces 

This section explores the possibility of obtaining subspaces by means of the action 
of a linear operator on vectors of an iV-dimensional vector space V. Let \a) be any 
vector in V, and A a linear operator on V. The vectors 

|a〉, A |a), A 2 |«>,... ? \a) 

are linearly dependent (there are W + 1 of them!). Let M = Span {It 
follows that, m = dimM < dim V, and JVC has the property that for any vector 
|x) e M the vector A [x) also belongs to M (show this!). In other words, no vector 
in M “leaves” the subspace when acted on by A. 

invariant subspace; 4.2.1. Definition. A subspace M is an invariant subspace of the operator A if A 
reduction of an transforms vectors o/M into vectors o/M. This is written succinctly as A(M) C M. 
operator y/ esa y thatM reduces A if both M and are invariant subspaces ofk. 

Starting with a basis of M, we can extend it to a basis B = {|^-)}^ =1 of V 
whose first m vectors span M. The matrix representation of A in such a basis is 
given by the relation A |«f) = 12^=1 a ji ^ ― 1, 2,..., TV. If / < m, then 
aji = 0 for j > m, because A \ai) belongs to M when i < m and therefore can 
be written as a linear combination of only {\a\) , \a 2 ) ,..., \a m )}. Thus, the matrix 
representation of Ain 5 will have the form 

— 、 0 21 a 2 J ， 

matrix representation where An is anm x m matrix, A 12 mmx (N — m) matrix, O 21 \hc(N — m) xm 
of an operator in a zero matrix，and A 22 an (A^ — m) 乂 （N — m) matrix. We say that An represents 
subspace t h e operator A in the m-dimensional subspace M. 

It may also happen that the subspace spanned by the remaining basis vectors 
in B, namely \a m ^.\) , |a m + 2 > ，•- • ， IflAr), is also an invariant subspace of A. Then 

block diagonal matrix A 12 will be zero, and A will take a block diagonal form: 1 
defined 

A=f An 0) 

_ V 0 A 22 / 

A matrix representation of an operator that can be brought into this form by a 
reducible and suitable choice of basis is said to be reducible; otherwise，it is called irreducible, 
irreducible matrices 

^rom now on, we shall denote all zero matrices by the same symbol regardless of their dimensionality. 


4.2 INVARIANT SUBSPACES 113 


A reducible matrix A is denoted in two different ways: 2 

a ^( A q a ° 2 ) ^ (4.3) 

For example, when M reduces A and one chooses a basis the first m vectors 
of which are in M and the remaining ones in 抓丄 ， then A is reducible. We have 
seen on a number of occasions the significance of the hermitian conjugate of an 
operator (e.g., in relation to hermitian and unitary operators). The importance of 
this operator will be borne out further when we study the spectral theorem later in 
this chapter. Let us now investigate some properties of the adjoint of an operator 
in the context of invariant subspaces. 

condition for 4*2.2. Lemma. A subspace M of an inner product space V is invariant under the 
invariance linear operator A if and only ifM 1 - is invariant under A^. 

Proof. The proof is left as a problem. □ 

An immediate consequence of the above lemma and the two identities (A 卞 ) 卞 = 
A and (M 丄）丄 =M is contained in the following theorem. 

4,2*3. Theorem. A subspace ofV reduces A if and only if it is invariant under 
both A and A^. 

4.2.4. Lemma* Let JA.be a subspace ofV andP the hermitian projection operator 
onto M. Then 3VC is invariant under the linear operator A if and only if AP = PAP. 

Proof. Suppose M is invariant. Then for any \x) in V, we have 

P\x) eM AP \x}eM ^ PAP \x) = AP \x). 

Since the last equality holds for arbitrary \x) 9 we have AP = PAP. 

Conversely, suppose AP = PAP. For any [j) e M, we have 

P|y) = \y) \y) = A |y> = P(AP \y)) € JVC. 

=PAP 

Therefore, M is invariant under A. □ 

4.2.5. Theorem. Let M be a subspace ofV t P the hermitian projection operator 
ofV onto M, and A a linear operator on V. ThenM reduces A if and only if A and 
P commute. 

2 It is common to use a single subscript for submatrices of a block diagonal matrix, just as it is common to use a single subscript 
for entries of a diagonal matrix. 


114 4. SPECTRAL DECOMPOSITION 


Proof. Suppose M reduces A. Then by Theorem 4.2.3, M is invariant under both 
A and A^. Lemma 4.2.4 then implies 

AP = PAP and A f P = PaV. (4.4) 

Taking the adjoint of the second equation yields (A^P)^ = (PA^P)^,orPA = PAP. 
This equation together with the first equation of (4.4) yields PA = AP. 

Conversely, suppose that PA = AP. Then P 2 A = PAP, whence PA = PAP. 
Taking adjoints gives A^P = PA^P, because P is hermitian. By Lemma 4.2.4, 
is invariant under Similarly, from PA = AP，we get PAP = AP 2 , whence 
PAP = AP. Once again by Lemma 4.2.4, M is invariant under A. By Theorem 
4.2.3, M reduces A. □ 

The main goal of the remaining part of this chapter is to prove that certain 
operators, e.g. hermitian operators, are diagonalizable, that is, that we can always 
find an (orthonormal) basis in which they are represented by a diagonal matrix. 


4.3 Eigenvalues and Eigenvectors 

Let us begin by considering eigenvalues and eigenvectors，which are generaliza¬ 
tions of familiar concepts in two and three dimensions. Consider the operation of 
rotation about the z-axis by an angle 0 denoted by R z (0). Such a rotation takes any 
vector (x 9 y) in the xj-plane to a new vector (x cos 6 —y&m6,xsm0 cos 9). 
Thus, unless (x,y) = (0,0) or ^ is an integer multiple of 2n, the vector will 
change. Is there a nonzero vector that is so special (eigen, in German) that it does 
not change when acted on by R^(0)? As long as we confine ourselves to two di¬ 
mensions, the answer is no. But if we lift ourselves up from the two-dimensional 
xy-plane, we encounter many such vectors, all of which lie along the z-axis. 

The foregoing example can be generalized to any rotation (normally specified 
by Euler angles). In fact, the methods developed in this section can be used to 
show that a general rotation, given by Euler angles, always has an unchanged 
vector lying along the axis around which the rotation takes place. This concept is 
further generalized in the following definition. 

eigenvalue and 4.3.1. Definition. A scalar X is an eigenvalue and a nonzero vector \a) is an 
eigenvector eigenvector of the linear transformation A € if 

A\a)^k\a). (4.5) 

4.3.2. Proposition. Add the zero vector to the set of all eigenvectors of A belonging 
to the same eigenvalue k f and denote the span of the resulting set by M 入 . Then 
Ha is a subspace ofV, and every (nonzero) vector in is an eigenvector of A 
with eigenvalue 


Proof. The proof follows immediately from the above definition and the definition 
of a subspace. □ 


4.3 EIGENVALUES AND EIGENVECTORS 115 


eigenspace 


spectrum 


characteristic 
polynomial and 
characteristic roots 
of an operator 


4*3.3. Definition. The subspace Ma is referred to as the eigenspace of A corre¬ 
sponding to the eigenvalue k. Its dimension is called the geometric multiplicity 
ofX. An eigenvalue is called simple if its geometric multiplicity is 1. The set of 
eigenvalues of A is called the spectrum of A. 

By their very construction, eigenspaces corresponding to different eigenvalues 
have no vectors in common except the zero vector. This can be demonstrated by 
noting that if |i?> € A M 弘 : for 入 关 〆, then 

0 = (A — XI) [u> = A |v> — A |u> = fji |u) — X |v) = (/x — X) |v) |u) = 0. 

' - V - > 

^0 

Let us rewrite Equation (4.5) as (A — A.1) \a) = 0. This equation says that \a) 
is an eigenvector of A if and only if \a) belongs to the kernel of A — XI. If the 
latter is invertible, then its kernel will consist of only the zero vector, which is 
not acceptable as a solution of Equation (4.5). Thus, if we are to obtain nontrivial 
solutions, A — A1 must have no inverse. This is true if and only if 

det(A-A1) = 0. (4.6) 

The determinant in Equation (4.6) is a polynomial in A., called the characteris¬ 
tic polynomial of A. The roots of this polynomial are called characteristic roots 
and are simply the eigenvalues of A. Now, any polynomial of degree greater than 
or equal to 1 has at least one (complex) root, which yields the following theorem. 

Theorem. Every operator on a finite-dimensional vector space overC has 
at least one eigenvalue and therefore at least one eigenvector. 

Let 入 i ，人 2 , ... ，人 p be the distinct roots of the characteristic polynomial of A, 
and let 入 j occur m j times. Then 3 

p 

det(A - A1) = (Ai - k) mi … (入 p - 入)〜 = n ⑷ -入 ) m : (4.7) 

7=1 

For 入 = 0, this gives 

det A = ^ p. (4.8) 



Equation (4.8) states that the determinant of an operator is the product of all 
its eigenvalues. In particular, if one of the eigenvalues is zero, then the operator is 
not invertible. 


m j is called the algebraic multiplicity of A.^. 


116 4. SPECTRAL DECOMPOSITION 


4.3.5. Example* Let us find the eigenvalues of a projection operator P. If |a) is an eigen¬ 
vector, then P |a )= 人 Applying P on both sides again, we obtain 

P 2 \a) = kP \a) = HX \a)) = X 2 \a ). 

But P 2 = P; thus, P \a) = X 2 \a). It follows that 入 2 |a }= 入 |a》，or ( 入 2 — 久 ） = O. Since 
\a) 0, we must have — 1) = 0, or 入 = 0, 1. Thus, the only eigenvalues of a projection 
operator are 0 and 1. The presence of zero as an eigenvalue of P is an indication that P is 
not invertible. ■ 

4.3.6* Example. To be able to see the difference between algebraic and geometric multi¬ 
plicities, consider the matrix A = (q } ), whose characteristic polynomial is (1 — X) 2 . Thus, 
the matrix has only one eigenvalue , 又 = 1， with algebraic multiplicity m\ = 2. However, 
the most general vector \a) satisfying (A-1) \a) = Ois easily shown to be of the form (q). 
This shows that is one'dimensional ， i.e.，the geometric multiplicity of 入 is 1. M 

As mentioned at the beginning of this chapter, it is useful to represent an 
operator by as simple a matrix as possible. The simplest matrix is a diagonal 
matrix. This motivates the following definition: 

diagonalizable 4.3.7. Definition. A linear operator A on a vector space V is said to be diagonal- 
operators izable if there is a basis for V all of whose vectors are eigenvectors of A. 

4.3*8* Theorem. Let A be a diagonalizable operator on a vector space V with 
distinct eigenvalues {A.j}^ =1 . There are (not necessarily hermitian) projection op¬ 
erators Pj on V such tltat 

(1)1 = !>;， (2) PfP ； = Ofori^j, (3) A = 

Proof. Let JVty denote the eigenspace corresponding to the eigenvalue Xj, Since 
the eigenvectors spanV and the only common vector of two different eigenspaces 
is the zero vector (see comments after Definition 4.3.3)，we have 

V = Mi ㊉ M 2 ㊉.•.㊉ M r . 

This immediately gives (1) and (2) if we use Equations (4.1) and (4.2) where Pj 
is the projection operator onto My. 

To prove (3), let |v) be an arbitrary vector in V. Then |u> can be written uniquely 
as a sum of vectors each coming from one eigenspace. Therefore, 

r r / r \ 

Aiv>=y^Ai^) = iuj> = 1 y^A.jPj 卜 >• 
j=i j=i V；=i / 

Since this equality holds for all vectors |u), (3) follows. □ 


4.4 SPECTRAL DECOMPOSITION 117 


4.4 Spectral Decomposition 

This section derives one of the most powerful theorems in the theory of linear 
operators, the spectral decomposition theorem. We shall derive the theorem for 
operators that generalize hermitian and unitary operators. 

normal 叩 erator 4.4.1. Definition. A normal operator is an operator on an inner product space 
defined that commutes with its adjoint. 

An important consequence of this definition is that 

|| Ax || = HA^II if and only if A is normal. (4.9) 

4*4.2. Proposition. Let A be a normal operator on V. Then \x) is an eigenvector 
of A with eigenvalue k if and only if \x) is an eigenvector ofA^ with eigenvalue 入 ' 

Proof. By Equation (4.9)，the fact that (A —入 1 — A 1 "—X*1, and the fact thatA—A.1 

is normal (reader, verify), we have [| (A 一入 1 )x || = 0 if and only if || (A 卞一入 *1 )x || = 
0. Since it is only the zero vector that has the zero norm, we get 

(A-^)\x)=0 if and only if (A 1 * - 入 *1) |x> = ()• 

This proves the proposition. □ 

We obtain a useful consequence of this proposition by applying it to a hermitian 
operator H and a unitary operator 4 U. In the first case, we get 

( 入 — 入 *)W=0 泠 k = k*. 

Therefore, X is real. In the second case, we write 

| 工 ） = 1 W = uu 1 1^) = U ( 入 * w) = X*U \x) = XX* \x) XX* = L 

Therefore ， 入 is unimoduiar (has absolute value equal to 1). We summarize the 
foregoing discussion: 

4.4.3. Corollary. The eigenvalues of a hermitian operator are real The eigenval¬ 
ues of a unitary operator have unit absolute value. 

4.4.4. Example. Let us find the eigenvalues and eigenvectors of the hermitian matrix 
H = (9 Wehave 

det(H 一入 1) = det 

Thus, the eigenvalues, 入 i = 1 and X 2 = —1, are real, as expected. 


4 Obviously, both are normal operators. 





118 4. SPECTRAL DECOMPOSITION 


Always normalize the 
eigenvectors! 


spectral 

decomposition 

theorem 


To find the eigenvectors, we write 

0 = (H - hi) | fll 〉= (H - 1) = 2 ) 

oro ；2 = icxi ，which gives |«i) = (^ (})»where ai isan arbitrary complex number. 

Also, 

0 = {H - X 2 1) |« 2 > = (H + 1)|a 2 ) = (l 7)(fe) = 

or ^2 — 一印 1 ， which gives |a 2 > = ( _^)= 芦 i( 上 ‘ ），where is an arbitrary complex 
number. 

It is desirable, in most situations, to orthonormalize the eigenvectors. In the present case ， 
they are already orthogonal. This is a property shared by all eigenvectors of a hermitian (in 
fact, normal) operator stated in the next theorem. We therefore need to merely normalize 
the eigenvectors: 

1 = (oil^i) = of* (1 -0«l (J) = 2|«i| 2 

or |ori I = 1 /V5 and«i = e t<p j\fl for some 沪 e M. A similar result is obtained for Pi. The 
choice (p = 0 yields 



The following theorem proves for all normal operators the orthogonality prop¬ 
erty of their eigenvectors illustrated in the example above for a simple hermitian 
operator. 

4.4.5. Theorem. An eigenspace ofa normal operator reduces that operator. More¬ 
over, eigenspaces ofa normal operator are mutually orthogonal. 

Proof. The first part of the theorem is a trivial consequence of Proposition 4.4.2 
and Theorem 4.2.3. To prove the second part，let |w> e Ma, and |u) € with 
Then, using Theorem 4.2.3 once more, we obtain 

A,{v|m) = {v\ku) = (u| Am) = {A 卞 v|«) = {fji*v\u) = /jl (v\u) 

It follows that (A. — fji) {v\ u) — 0 and since k ^ fi, (v\u) = 0. □ 

4.4.6. Theorem* (Spectral Decomposition Theorem) Let A be a normal operator 
on a finite-dimensional complex inner product space V. Let 入 i ， 入 2 , _ • •，入厂 be its 
distinct eigenvalues. Then there exist nonzero (hermitian) projection operators 
Pi ， P 2 , …， Pr such that 

1. P* P ； = 0 Vj ^ ]； 

2 - Ei=i= v 


4.4 SPECTRAL DECOMPOSITION 119 


3. XHl 久 iP/ = A. 

Proof. Let P/ be the operator that projects onto the eigenspace H/ corresponding 
to eigenvalue 入卜 By comments after Proposition 4.1.6, these projection operators 
are hermitian. Because of Theorem 4.4.5, the only vector common to any two 
distinct eigenspaces is the zero vector. So, it makes sense to talk about the direct 
sum of these eigenspaces. Let M = Mi ©M 2 0 - - andP ― Pi ， where 
Pis the orthogonal projection operator onto M. Since A commutes with every P/ 
(Theorem 4.2.5)，it commutes with P. Hence, by Theorem 4.2.5, M reduces A, i.e .， 
M 1 - is also invariant under A ‘ Now regard the restriction of A to M 丄 as an operator 
in its own right on the finite-dimensional vector space M* 1 . Theorem 4.3.4 now 
forces A to have at least one eigenvector in M 丄 ， But this is impossible because all 
eigenvectors of A have been accounted for in its eigenspaces. The only resolution 
is for M -1 to be zero. This gives 

r 

V = Mi ㊉ M 2 © …㊉ and 1 = P 卜 

i=i 

The second equation follows from the first and Equations (4,1) and (4.2). The 
remaining part of the theorem follows from arguments similar to those used in the 
proof of Theorem 4,3.8. □ 

We can now establish the connection between the diagonalizability of a normal 
operator and the spectral theorem. In each subspace Mi, we choose an orthonormal 
basis. The union of all these bases is clearly a basis for the whole space V. Let 
us label these basis vectors where the subscript indicates the subspace and 

the superscript indicates the particular vector in that subspace. Clearly, (e) \e S j, )= 

Sss^jf and Pj = ky> {^1- Noting that \e S j f ) = &kj f |^v), we can obtain 
the matrix elements of Ain such a basis: 

(e s j We s j f ) - (^j\ p i\ e f) = X ] ki ^ f ( e i \ e r ) = H \ e r ) - 

1=1 i=l 

Only the diagonal elements are nonzero. We note that for each subscript j we have 
trij orthonormal vectors \e s j), where ntj is the dimension of My. Thus, 入 y occurs 
ntj times as a diagonal element. Therefore, in such an orthonormal basis，A will 
be represented by 

diag ( ’ ，入 1 ，入2， ■ • • ，入2， •. •，入广，"-，入/ ■* )• 

V- v - . ' - ¥ - ' ' -V- ; 

m\ times m 2 times m r times 

Let us summarize the foregoing discussion: 

4.4.7, Corollary. If A e ,C(V) is normal, then V has an orthonormal basis consist¬ 
ing of eigenvectors of A. Therefore, a normal operator on a complex inner product 
space is diagonalizable. 




120 4. SPECTRAL DECOMPOSITION 


Computation of the 
largest and the 
smallest eigenvalues 
of a normal operator 


A hermitian matrix 
can be diagonalized 
by a unitary matrix. 


Using this corollary, the reader may show the following: 


4.4.8. Corollary, A hermitian operator is positive if and only, if all its eigenvalues 
are positive. 

4.4.9* Example, computation of largest and smallest eigenvalues 
There is an elegant technique that yields the largest and the smallest (in absolute value) 
eigenvalues of a normal operator A in a straightforward way if the eigenspaces of these 
eigenvalues are one dimensional. For convenience, assume that the eigenvalues are labeled 
in order of decreasing absolute values: 

I 入 ll > I 入 2l > ■*• > \^r \ # 0. 


Let (1%)}^! be a basis of V consisting of eigenvectors of A, and |a:) = \ a k) 311 

arbitrary vector in V. Then 


N N 

A w |x) = \ajc)=\ a k) = 

k^l 

In the limit m 00 , the summation in the brackets vanishes. Therefore, 


N , m 

i^i>+^ i^jt) 


A m |x> ~ [«i) and (y\A m \x)^ k n ^ l (y\a l ) 

for any |y) e V. Taking the ratio of this equation and the corresponding one for m + 1， we 
obtain 

— -■ 入 i • 

Note how crucially this relation depends on the fact that is nondegenerate, i.e., that M\ 
is one-dimensional. By taking larger and larger values for m，we can obtain a better and 
better approximation to the largest eigenvalue. 

Assuming that zero is not the smallest eigenvalue k r — and therefore notan eigenvalue — 
of A, we can find the smallest eigenvalue by replacing A with A -1 and with 1/X r . The 
details are left as an exercise for the reader. 豳 


(y\A m ^ |^) 
m^oo {y\A m \x) 


Any given hermitian matrix H can be thought of as the representation of a 
hermitian operator in the standard orthonormal basis. We can find a unitary matrix 
U that can transform the standard basis to the orthonormal basis consisting of \e s j), 
the eigenvectors of the hermitian operator. The representation of the hermitian 
operator in the new basis is UHU^, as discussed in Section 3.3. However, the above 
argument showed that the new matrix is diagonal. We therefore have the following 
result. 

4.4.10. Corollary. A hermitian matrix can always be brought to diagonal form by 
means of a unitary transformation matrix. 

4.4.11. Example. Let us consider the diagonalization of the hermitian matrix 



4.4 SPECTRAL DECOMPOSITION 121 


0 -1 + z 

2 — 1 + 1 . 

-1 -/ 2 
1 - i 0 


/ 2 
0 

-l-i 

1 + / 


The characteristic polynomial is det(H — 久 1 )= (入 + 2) 2 (k — 2) 2 . Thus, X\ = —2 with 
multiplicity m\ = 2, and X 2 = 2 with multiplicity m 2 = 2. To find the eigenvectors, we 
first look at the matrix equation (H + 21) \a) — 0, or 


This is a system of linear equations whose “solutiotT is 

«3 = 士 (1 +0(ai +a 2 )， Q?4 = 5 (1 -0(^1 -«2)* 

We have two arbitrary parameters, so we expect two linearly mdependent solutions. For the 
two choices ffi = 2, o ?2 — 0 and a\ = 0, 0^2 = 2, we obtain, respectively, 


ai) 


2 

0 

1 + z 

VI - i) 


and 


« 2 > 


0 

2 


\-l + 

which happen to be orthogonal. We simply normalize them to obtain 


ki) 


1 


2V2 


2 

0 


Kl-i/ 


and 


k2) 



Similarly, the second eigenvalue equation, (H —21) \a) =0, gives rise to the conditions 
of 3 = —士 (1 + z_)(o?i + « 2 ) and = —1(1 — 0(«1 — 的 )， which produce the orthonormal 
vectors 


k3> = 2U 

/ 2 \ 
0 

-l-i 

1 / ^ ) 
^ ^ )= 2V2\-l-i 


1 

\l-i / 


The unitary matrix that diagonalizes H can be constructed from these column vectors 
using the remarks before Example 3.4.4, which imply that if we simply put the vectors |e/) 
together as columns, the resulting matrix is ut: 


u 1. 


2V2 


2 

0 


0 

2 


[l-i -1 + z 


2 0 

0 2 

1 一 / 一 1- 
1 + i 1 — 



122 4. SPECTRAL DECOMPOSITION 


and the unitary matrix will be 

/2 0 1 


u = (uV 


2y/2 


1+i 、 


0 2 l-i 


2 0 -1 + / -1 


0 2 -1+i 


+ 


We can easily check that U diagonalizes H, i.e.，that UHU^ is diagonal. 圈 

application of 4A12. Example. In some physical applications the ability to diagonalize matrices can 
diagonalization in be very useful. As a simple but illustrative example, let us consider the motion of a charged 
electromagnetism particle in a constant magnetic field pointing in the z direction. The equation of motion for 
such a particle is 

dy (^ x ^ ^ z \ 

m— = q\ xB = qdQt\v x v y v z ), 

dt \0 0 B/ 

which in component form becomes 

dv x qB dv y qB dv z 

-jr - — v y^ = - 〜， ~17 = 0 - 

dt m dt m dt 

Ignoring the uniform motion in the z direction, we need to solve the first two coupled 
equations, which in matrix fonn becomes 

!(：；)=?(-! K x )=M-i X；y _) 

where we have introduced a factor oH to render the matrix hermitian, and defined co = 
qB/m. If the 2 x 2 matrix were diagonal, we would get two uncoupled equations, which 
we could solve easily. Diagonalizing the matrix involves finding a matrix R such that 


D = R (i > 一 1 = (3 二 ). 


If we could do such a diagonalization, we would multiply (4.10) by R to get 5 

which can be written as 

i ⑵一 《 二)⑵ =( ⑽ )，• ⑵ -<)- 


5 The fact that R is independent of r is crucial in this step. This fact, in turn, is a consequence of the independence from t of 
the original 2x2 matrix. 


4.4 SPECTRAL DECOMPOSITION 123 


simultaneous 

diagonalization 

defined 


We then would have a pair of uncoupled equations 

dv ! x , dv f y 

^ = -io)fi 2 v y 

that have v f x = and v f y = v f Q y e~ lQ>fZ2t as a solution set, in which v^ x and v f Qy 

are integration constants. 

To find R ， we need the normalized eigenvectors of j,). But these are obtained in 
precisely the same fashion as in Example 4.4.4. There is, however, an arbitrariness in the 
solutions due to the choice in numbering the eigenvalues. If we choose the normalized 
eigenvectors 

» ^ G ) ， |ea) = Ti (T) • 

then from comments at the end of Section 3.3, we get 

R_1=Rt= ^(! 7) ^ R=(Rt)t = ^(7 !)• 

With this choice of R, we have 

R (-°i o) R_1= (i -°i)' 


so that /ii = 1 = 一 fi/x. Having found R^, we can write 



(4.11) 


If the x and y components of velocity at ? = 0 are vq x and respectively, then 

( 他 ） =Rt ( V 0x\ i or ( V 0x\ = R ^ 0x \ = 1 / -lV 0x ^- V 0y \ 

\v 0y J \v 0y J \v 0 y/ ^ lV 0x + % / 

Substituting in (4.11)，we obtain 

= 1 fi f (-ivo x + = / vq x coscot-\-VQ y sin cot \ 

\u>/ _ 2 \1 1 / \ (ivox + VQy)e lQ)t ) ~ \-uox + i ； o y cos cotJ . 

This gives the velocity as a function of time. Antidifferentiating once with respect to time 
yields the position vector. 國 

In many situations of physical interest, it is desirable to know whether two 
operators are simultaneously diagonalizable. For instance, if there exists a basis 
of a Hilbert space of a quantum-mechanical system consisting of simultaneous 
eigenvectors of two operators, then one can measure those two operators at the 
same time. In particular, they are not restricted by an uncertainty relation. 

4.4.13. Definition. Two operators are said to be simultaneously diagonalizable if 
they can be written in terms of the same set of projection operators, as in Theorem 
4.4.6. 


124 4. SPECTRAL DECOMPOSITION 


necessary and 
sufficient condition 
for simultaneous 
diagonalizability 


This definition is consistent with the matrix representation of the two operators, 
because if we take the orthonormal basis B = {lej)} discussed right after Theorem 
4.4.6, we obtain diagonal matrices for both operators. What are the conditions under 
which two operators can be simultaneously diagonalized? Clearly, a necessary 
condition is that the two operators commute. This is an immediate consequence of 
the orthogonality of the projection operators, which trivially implies P* Pj = P jPi 
for all / and 7 . It is also apparent in the matrix representation of the operators: Any 
two diagonal matrices commute. What about sufficiency? Is the commutativity 
of the two operators sufficient for them to be simultaneously diagonalizable? To 
answer this question, we need the following lemma: 

4.4.14. Lemma. An operator T commutes with a normal operator A if and only if 
T commutes with all the projection operators of A. 

Proof. The “if” part is trivial. To prove the “only if ’ part, suppose AT = TA, 
and let \x) be any vector in one of the eigenspaces of A, say M；. Then we have 
A(T|^)) = T(A |^:)) = T(kj \x)) =\/(T| ； 0);i.e.，T Wisin：)^，orMj is invariant 
under T. Since Mj is arbitrary, T leaves all eigenspaces invariant. In particular, it 
leaves Mj% the orthogonal complement of Mj (the direct sum of all the remaining 
eigenspaces), invariant. By Theorems 4.2.3 and 4.2.5, TPj = PyT; and this holds 
for all j. □ 

4.4.15. Theorem. A necessary and sufficient condition for two normal operators 
A and B to be simultaneously diagonalizable is [A, B] = 0. 

Proof. As claimed above, the “necessity” is trivial. To prove the “sufficiency,” let 
A = Ey =1 ^jPj and B = ELi fikOk 9 where { 入 ）} and {Pj} are eigenvalues and 
projections of A, and {^} and {Q^} are those of B. Assume [A, B] = 0. Then 
by Lemma 4.4.14, AQ^ = Q^A. Since commutes with A, it must commute 
with the latter’s projection operators: PjQk = QkPj-To complete the proof, define 
Rjk = PjQk ，and note that 

— (PyQit) 1 = oIp) = OkPj = PjQ ^； = R 片， 

(Rjit ) 2 = (PyQit ) 2 = PyQjtPjQjfe = PjPjOkOk = PjOk = Rjk- 
Therefore, Ry* are hermitian projection operators. Furthermore, 

r r 

^2 R J k = 5Z Qk =Qfe * 

j 1 • •% 



Similarly, ^ =1 = Yfk=i = Pj Zl=i Qfe = We can now write A 

and B as 

r r s s s r 

A = E 〜 p ;‘ = EE 〜 R 如 B = Y^f^kOk = & kR 作 . 

j=l 7=1 k=l k=\ k=l j=l 


4.5 FUNCTIONS OF OPERATORS 125 


By definition, they are simultaneously diagonalizable. 


□ 


spectra) 
decomposition of a 
Pauli spin matrix 


4.4.16. Example. Let us find the spectral decomposition of the Pauli spin matrix 
/0 -i\ 

a2 = \i oJ* 


The eigenvalues and eigenvectors have been found in Example 4.4.4. These are 


入 1 


ki) 


^/2\i 


and 入 2 = -1 ， \ e l) 


V2 


( 上 ) ■ 


The subspaces are one-dimensional; therefore, 

Pi =| ei >( eil = ^ C )^ (1 7 )' 

P2 = k2> tel = ^(V)(l = ；)■ 

We can check that Pj + P 2 = (q and 

hPl+^P2 = \(] l) = Ci 7) = ^ 2 ' 



What restrictions are to be imposed on the most general operator T to make 
it diagonalizable? We saw in Chapter 2 that T can be written in terms of its so- 
called Cartesian components as T = H + 出 ’ where both H and H 7 are hermitian 
and can therefore be decomposed according to Theorem 4.4.6. Can we conclude 
that T is also decomposable? No. Because the projection operators used in the 
decomposition of H may not be the same as those used for H’. However, if H and 
H ; are simultaneously diagonalizable such that 


r r , 

H = and W = (4.12) 

k=l k=l 

then T = ^ =1 (^ + It follows that T has a spectral decomposition, 

and therefore is diagonalizable. Theorem 4.4.15 now implies that H and H ; must 
commute. Since, H = |(T + T^) and H’ = 去(丁 一 T 卞)， we have [H ， H'] = 0 if and 
only if [T ， T 卞 ] = 0; i.e” T is normal. We thus get back to the condition with which 
we started the whole discussion of spectral decomposition in this section. 


4*5 Functions of Operators 

Functions of transformations were discussed in Chapter 2. With the power of 
spectral decomposition at our disposal, we can draw many important conclusions 
about them. 




126 4, SPECTRAL DECOMPOSITION 


First, we note that if T = then, because of orthogonality of the P/ 7 s 

t 2 = x^Pj. 

i=l f=l 

Thus, any polynomial p in T has a spectral decomposition given by p(T)= 
Generalizing this to functions expandable in power series gives 

oo 

/( t ) = E/(、) p ，- (4 - 13) 

i=l 

4.5.1. Example. Let us investigate the spectral decomposition of the following unitary 
(actually orthogonal) matrix: 

_ /cosO — sin0\ 

~~ \sin0 cos 0 / ' 


We find the eigenvalues 

= X 2 -2cos^ + l = 0, 

cos^ -V 

yielding 入 i = e 一出 and 入 2 = ■ For we have (reader, provide the missing steps) 




cos 0 — e lQ 
sin0 




Ci2 = ia\ ki) 


V2 


and for 入 2 ， 

/ cos 0 — e~ l ° 
\ sin0 


— sin0 
cosO — e -i ❶ 


)© ==0 ^ 巧=一叫今 \e 2 ) = (^) . 

We note that the Mxj are one-dimensional and spanned by \ej). Thus, 

Pl = klX«ll = 5 ()) (l = 7) - 

p 2 = k 2 ) tel = ^ Q (1 0 = ^(_\. ；)■ 

Clearly, P\ + P 2 = 1, and 

e~ w -ie- i9 \ 1 / e w ie i9 \ 

J + 2 {-ie w e w J = U> 


e^ W ?i+e ie P 2 


-te 
e- iB 


2 

If we take the natural log of this equation and use Equation (4.13), we obtain 

lnU = \n(e- w )P 1 +]n(e ie )P 2 = -i0?i + WP 2 

+^P2) = iH, (4.14) 

where H = —9P\ +0Pjis ahermitian operator because 0 is real and Pi and P 2 are hermitian. 
Inverting Equation (4.14) gives U =： e lH f where 


H，— Pl+P2) d 


睡 


4.5 FUNCTIONS OF OPERATORS 127 


The example above shows that the unitary 2x2 matrix U can be written as 
an exponential of an anti-hermitian operator. This is a general result. In fact, we 
have the following theorem, whose proof is left as an exercise for the reader (see 
Problem 4.22). 

4.5.2. Theorem. A unitary operator U on a finite-dimensional complex innerprod- 
uct space can be written U = e lH where H is hermitian. Furthermore, a unitary 
matrix can be brought to diagonal form by a unitary transformation matrix. 

The last statement follows from Corollary 4A10 and the fact that 
/(RHR -1 ) = R/(H)R _1 


The square root of an 
operator is plagued 
by multivaluedness. 
In the real numbers, 
we have only 
two-valuedness! 


for any function / that can be expanded in a Taylor series. 

A useful function of an operator is its square root. A natural way to define the 
square root of an operator A is Va = This clearly gives many 

candidates for the root, because each term in the sum can have either the plus sign 
or the minus sign. 

4.5.3. Definition* The positive square root of a positive operator A = 入 ! P/ 

= ELi p i- 」 


The uniqueness of the spectral decomposition implies that the positive square 
root of a positive operator is unique. 

4,5.4. Example. Let us evaluate y/A where 




A 


First, we have to spectrally decompose A. Its characteristic equation is 
入 2 — 10 人 +16 = 0 ， 

with roots 入 i = 8 and X 2 = 2. Since both eigenvalues are positive and A is hermitian, we 
conclude that A is indeed positive (Corollary 4.4.8). We can also easily find its normalized 
eigenvectors: 

|ei) =^ (D and |e2>= ^(7) - 

Thus, 

Pl = kl>{eil = ^(^. ；), P 2 = k 2 > tel = ^ (I 7), 

and 

^=^+^2=^(4 7)=^ (i 

We can easily check that (VA) 2 = A. 


■ 



128 4. SPECTRAL DECOMPOSITION 


Intuitively, higher and higher powers of T, when acting on a few vectors of 
the space, eventually exhaust all vectors, and further increase in power will be 
a repetition of lower powers. This intuitive idea can be made more precise by 
looking at the projection operators. We have already seen that T n = ^ =1 

n — 1,2,_For various n’s one can “solve” for P) in terms of powers of T. Since 

there are only a finite number of P/s，only a finite number of powers of T will 
suffice. In fact, we can explicitly construct the polynomial in T for P j ■ If there is such 
a polynomial, by Equation (4,13) it must satisfy Pj = Pj(T) = ^ =1 pjiXk)?^, 
where p j is some polynomial to be determined. By orthogonality of the projection 
operators, pj ( 入灸） must be zero unless k = j t in which case it must be 1. In other 
words, Pj(^k) — hj- Such a polynomial can be explicitly constructed: 


PjM - 





入 t 

kk 


Therefore, 


Pj= Pj(V 


■TT T — 入炎 1 


and we have the following result. 


(4.15) 


4.5.5. Proposition. Every function of a normal operator on a finite-dimensional 
vector space can be expressed as a polynomial In fact, from Equations (4.13) and 
(4.15h ^ 


7=1 ；=1 咕入广人无 


(4.16) 


4.5.6. Example. Let us write VAof the last example as a polynomial in A. We have 


p\w = n 

mi 


A —入灸 1 
入 1 


A - 入 2 1 
入 1 -久2 



piw = n 

k 參 1 


^ ~ 入左1 
入2 —入 it 


A - 入。 
入2 一入1 


- 7 (A- 8 ). 


Substituting in Equation (4.16), we obtain 

W = V^TPI ㈧ + ▲ P2(A) = ^(A 一 2) — ^(A — 8) = + #• 

o 6 6 3 

The RHS is clearly a (first-degree) polynomial in A, and it is easy to verify that it is the 
matrix of obtained in the previous example. M 



4.6 POLAR DECOMPOSITION 129 


4.6 Polar Decomposition 

We have seen many similarities between operators and complex numbers. For 
instance, hermitian operators behave very much like the real numbers: they have 
real eigenvalues; their squares are positive; every operator can be written asH+iH^, 
where both H and are hermitian; and so forth. Also, unitary operators can be 
written as exp/H，where H is hermitian. So unitary operators are the analogue of 
complex numbers of unit magnitude such as e ld . A general complex number can 
be written as re l °. Can we write an arbitrary operator in an analogous way? The 
following theorem provides the answer. 

polar decomposition 4*6.1, Theorem, (polar decomposition theorem) An operator A on a finite-dime n- 
theorem sional complex inner product space can be written A = U R where R is a (unique) 
positive operator andUa unitary operator. If A is invertible, then U is also unique. 

Proof. We will prove the theorem for the case where the operator is invertible. 
The proof of the general case can be found in books on linear algebra (such as 
[Halm 58]). 一 

The reader may show that the operator A 卞 A is positive. Therefore, it has a 
unique positive square root R. We let V = RA _1 ，or VA = R. Then 

W 1 = RA^^RA- 1 ) 1 = RA-^A -1 ) 1 *^ = 

= ^R 2 ) -1 ^ = RC^R) -1 ^ = RR _1 (R 卞厂 1 M = 1 

and 

V f V = (RA-VRA- 1 = (A 1 ) -1 

and Vis indeed a unitary operator. Now choose U = to get the desired decom¬ 
position. 

To prove uniqueness we note that UR = implies that R = and 

R 2 = RtR = (U^'RO^^U'R') = R'V 卞 uWr' =： R' 卞 R’ = R' 2 . Since the 
positive transformation R 2 (or B 2 ) has only one positive square root, it follows 
that R = R’ 

If A is invertible, then so is R = U^A. Therefore, 

UR = U’R 今 URFT 1 = U'RFT 1 今 U = U ’， 
and U is also unique. □ 

It is interesting to note that the positive definiteness ofR and the nonuniqueness 
of U are the analogue of the positivity of r and the nonuniqueness of e ie in the 
polar representation of complex numbers: 

z = re w = re i(Wn7r) V n € Z. 


R^R A -1 = 

R 2 




130 4. SPECTRAL DECOMPOSITION 


In practice, R is found by spectrally decomposing A^A and taking its positive 
square root. 6 Once R is found, U can be calculated from the definition A = UR. 

4«6.2. Example. Let us find the polar decomposition of 

d 

We have 

⑼书 f ) = ( 丄，. 

The eigenvalues and eigenvectors of R 2 are routinely found to be 
h = 18’ ^2 = 2, ㈨ = $(>)， 

The projection matrices are 

pi=ki,<eii= K-^ ^)' P2=le2> fei= K^ ~ l f)- 

Thus, 

— d 

To find U，we note that det A is nonzero. Hence, A is invertible, which implies that R is also 
invertible. The inverse of R is 

R -1 _ J_ fllV2 -iVl4\ 

~ 24 Va /14 5 V 2 )' 

The unitary matrix is simply 

It is left for the reader to verify that U is indeed unitary. 阖 


4.7 Real Vector Spaces 

The treatment so far in this chapter has focused on complex inner product spaces. 
The complex number system is far more complete than the real numbers. For 
example, in preparation for the proof of the spectral decomposition theorem, we 
used the existence of n roots of a polynomial of degree n over the complex field 


6 It is important to pay attention to the order of the two operators: One decomposes A 卞 A, not AA 卞 . 




4.7 REAL VECTOR SPACES 131 


(this is the fundamental theorem of algebra). A polynomial over the reals, on the 
other hand, does not necessarily have all its roots in the real number system. 

It may therefore seem that vector spaces over the reals will not satisfy the useful 
theorems and results developed for complex spaces. However, through a process 
called complexification of a real vector space, in which an imaginary partis added 
to such a space, it is possible to prove (see, for example, [Halm 58]) practically all 
the results obtained for complex vector spaces. Only the results are given here. 

4.7.1. Theorem. A real symmetric operator has a spectral decomposition as stated 
in Theorem 4.4.6. 

This theorem is especially useful in applications of classical physics, which 
deal mostly with real vector spaces. A typical situation involves a vector that 
is related to another vector by a symmetric matrix. It is then convenient to find 
a coordinate system in which the two vectors are related in a simple manner. 
This involves diagonalizing the symmetric matrix by a rotation (a real orthogonal 
matrix). Theorem 4.7.1 reassures us that such a diagonalization is possible. 

4.7.2, Example. For a system of N point particles constituting a rigid body, the total 
angular momentum L — m i( r i x v /) is related to the angular frequency viaL = 

E/Li m i\Xi x (w x r/)] = 叫 [o»v • rm-(iv w)]，or 



where 


N N N 

hx =Yl m i~ X i^ hy = Z ] mi ^ r i ~ Izz = X! mi(r ? ~ 

i=l / =1 (=1 

N N N 

Ixy = — 〉 Ixz = — 〉 JTtiXjZit lyz ^ — ^ 


^xy = ^yxi ^xz = ^yz = ^zy • 

The 3x3 matrix is denoted by I anc 


The 3x3 matrix is denoted by I and is called the moment of inertia matrix. It is 
symmetric, and Theorem 4.7.1 permits its diagonalization by an orthogonal transformation 
(the counterpart of a unitary transformation in a real vector space). But an orthogonal 
transformation in three dimensions is merely a rotation of coordinates. 7 Thus, Theorem 
4.7.1 says that it is always possible to choose coordinate systems in which the moment of 
inertia matrix is diagonal. In such a coordinate system we have L x = 
mdL z = I Z z^ simplifying the equations considerably. 

Similarly, the kinetic energy of the rigid rotating body, 

N. N N 

T = ^ ^ m i v f = ^2 士 m *_ v / . ( w x . ( r i x v f) = ■ L — 


7 This is not entirely true! TTiere are orthogonal transformations that are composed of a rotation followed by a reflection about 
the origin. See Example 3.5.8. 




132 


.SPECTRAL DECOMPOSITION 


which in general has off-diagonal terms involving I xy and so forth, reduces to a simple 
fonn: T = ^Ixxco^ 4 - ^,^yy^y 國 

4.7.3. Example. Another application of Theorem 4.7.1 is in the study of conic sections. 
The most general form of the equation of a conic section is 

a\x 2 + a^y 2 + a^xy + a^x + a^y + a 6 = 0 ， 

where ,..., ag ^ constants. If the coordinate axes coincide with the principal axes of 
the conic section, the xy term will be absent, and the equation of the conic section takes 
the familiar form. On geometrical grounds we have to be able to rotate ^^-coordinates to 
coincide with the principal axes. We shall do this using the ideas discussed in this chapter. 

First, we note that the general equation for a conic section can be written in matrix form 
as 

“ C 1f)0 作 4 a《)+« 6 = 0 . 

The 2x2 matrix is symmetric and can therefore be diagonalized by means of an orthogonal 
matrix R. Then R^R = 1, and we can write 

。 y)R f R( a ^ 2 a ^) Q + («4 «5)R ( R(p + «6 = 0. 

Let 

<K:), <)2 If 卜 《 u), <)=©• 

Then we get 

{x，/} (o (y) + K «s)(y) + «6 = 0 ； 

or 

+ a^x’ + + 05 = 0 . 

The cross term has disappeared. The orthogonal matrix R is simply a rotation. In fact, 
it rotates the original coordinate system to coincide with the principal axes of the conic 
section. Q 

4.7.4. Example. In this example we investigate conditions under which a multivariable 
function has a maximum or a minimum. 

A point a = (a\, a% . a n ) € is a maximum (minimum) of a function 

/(xi ， x 2 ,... 9 x n ) = f(r) 
if 





4.7 REAL VECTOR SPACES 133 


and for small Xi — a/, the difference /(r) — /(a) is negative (positive). To relate this 
difference to the topics of this section, write the Taylor expansion of the function around a 
keeping terms up to the second order: 

n 1 n / 3^f \ 

(g) ㈡ + 3 - 句 )…㈣ + … ， 

^一1 ’，《/ 

or, constructing a column vector out of Si s xi — ai and a symmetric matrix Dij out of the 
second derivatives, we can write 

/(r) = / ⑻ + \ ^SjSjDjj + … /(r) — /⑻ = 
ij 

because the first derivatives vanish. For a to be a minimum point of /， the RHS of the last 
equation must be positive for arbitrary S. This means that D must be a positive matrix. 8 
Thus, all its eigenvalues must be positive (Corollary 4.4.8). Similarly, we can show that for 
a to be a maximum point of /, — D must be positive definite. This means that D must have 
negative eigenvalues. 

When we specialize the foregoing discussion to two dimensions, we obtain results that 
are familiar from calculus. For the function f(x, y) to have a minimum, the eigenvalues of 
the matrix 


(fxx Jxy\ 

、fyx fyy) 

must be positive. The characteristic polynomial 


yields two eigenvalues: 




入 2 — (/xx + fyy)k + fxxfyy ~ fxy = ^ 


入 1 


入 2 


fxx + fyy^~-\l (fxx — fyy) 2 + ^fxy 


2 


fxx + fyy ~ yj(fxx **** fyy)^ + ^fxy 


2 


These eigenvalues will be both positive if 


fxx + fyy > yifxx~ fyy)^ 
and both negative if 


fxx + fyy < -yj(fxx — fyy) 2 + ^fxy- 
Squaring these inequalities and simplifying yields 

fxxfyy > fxy^ 


^Note that D is already symmetric — the real analogue of hennitian. 


134 4. SPECTRAL DECOMPOSITION 


which shows that f xx and f yy must have same sign. If they are both positive (negative), 
we have a minimum (maximum). This is the familiar condition for the attainment of extrema 
by a function of two variables. M 

Although the establishment of spectral decomposition for symmetric opera¬ 
tors is fairly straightforward，the case of orthogonal operators (the counterpart of 
unitary operators in a real vector space) is more complicated. In fact，we have 
already seen in Example 4.5.1 that the eigenvalues of an orthogonal transforma¬ 
tion in two dimensions are, in general, complex. This is in contrast to symmetric 
transformations. 

Think of the orthogonal operator O as a unitary operator. 9 Since the absolute 
value of the eigenvalues of a unitary operator is 1 ， the only real possibilities are 土 1. 
To find the other eigenvalues we note that as a unitary operator, O can be written 
as e A ，where A is anti-hermitian (see Problem 4.22). Since hermitian conjugation 
and transposition coincide for real vector spaces, we conclude that A = — A f ， and 
A is antisymmetric. It is also real, because O is. 

Let us now consider the eigenvalues of A. If A is an eigenvalue of A corre¬ 
sponding to the eigenvector |a), then {a\A\a} = k (a\a). Taking the complex 
conjugate of both sides gives (a| A 1 " \a) = X* {a\a}; but A ， = A f = —A, because 
A is real and antisymmetric. We therefore have {a\ A \a) = —k* {a\a)^ which 
gives X* = —入 _ It follows that if we restrict k to be real, then it can only be zero; 
otherwise, it must be purely imaginary. Furthermore, the reader may verify that if 
入 is an eigenvalue of A, so is —k. Therefore, the diagonal form of A looks like this: 

Adiag = diag(0, 0,…，0, 诏1 , -iOuiOi, -W 2 , …， 收， 一紙 ) ， 
which gives O the following diagonal form: 

Odiag = e Adi3g - diag(e°, e 0 , … ， eV 01 ， e— 冲 〆 两， 厂吻 ， … ，巧， e~ Wk ) 

with $ 1 , 62 , ,0k all real. It is clear that if O has —1 as an eigenvalue, then some 

of the 0’s must equal 士 tt. Separating the tt’s from the rest of 沒 ’s and putting all 
of the above arguments together, we get 

Odiag = diag( l，l，：..，l ， -1 ， -1 ， ". ， -1 ， e- W \e i9 ^, e ， ，…， e W ' e 為、 
N+ N_ 

where N- 2m — dim O. 

Getting insight from Example 4.5.1, we can argue, admittedly in a nonrigorous 
way, that corresponding to each pair e ±i9 J is a 2 x 2 matrix of the form 


/cos 0j 
\sin0j 


— sin 沒 




J )^ ^2(0j) 


(4.17) 


9 This can always be done by formally identifying transposition with hermitian conjugation, an identification that holds when 
the underlying field of numbers is real. 



4.7 REAL VECTOR SPACES 135 


-sL). 

cosO ) 


which is a rotation through the angle 沒 about the (new) x-axis. ■ 

Combining the rotation of the example above with the translations, we obtain 
the following theorem. 

4.7.7. Theorem. (Euler) The general motion of a rigid body consists of the trans¬ 
lation of one point of that body and a rotation about a single axis through that 
point. 

Finally, we quote the polar decomposition for real inner product spaces. 

4.7.8. Theorem. Any operator A on a real inner product space can be written as 
A = OR, where R is a (unique) symmetric positive operator and O is orthogonal 


Excluding the reflections (corresponding to 一 l’s) and the trivial identity rotation, we con¬ 
clude that any rotation of a rigid body can be written as 


We therefore have the following theorem (refer to [Halm 58] for a rigorous treat¬ 
ment). 

4.7.5. Theorem. A real orthogonal operator on a real inner product space V 
cannot，in general，be completely diagonalized. The closest it can get to a diagonal 
form is 

Odiag = diag(l, 1 ， … ， 1 ，一 1 ， 一 1 ， … ，一 1 ， F? 2 ( 沒 2 )， … ， R2( 味”))， 

N 七 N- 

where N^. N- -\-2m = dim V and R 2 (^*) is as given in (4.17). Furthermore, 
the matrix that transforms an orthogonal matrix into the form above is itself an 
orthogonal matrix. 

The last statement follows from Theorem 4.5.2 and the fact that an orthogonal 
matrix is the real analogue of a unitary matrix. 

4.7.6. Example* An interesting application of Theorem 4.7.5 occurs in classical mechan¬ 
ics, where it is shown that the motion of a rigid body consists of a translation and a rotation. 
The rotation is represented by a 3 x 3 orthogonal matrix. Theorem 4.7.5 states that by an 
appropriate choice of coordinate systems (i.e., by applying the same orthogonal transfor¬ 
mation that diagonalizes the rotation matrix of the rigid body), one can “diagonalize” the 
3x3 orthogonal matrix. The “diagonal” form is 


沒 0 
Gcossm 




136 4. SPECTRAL DEC0MP0S1TI0W 


4.7.9. Example. Let us decompose the following matrix into its polar form: 



The procedure is the same as in the complex case. We have 

R2 - f -(o - 2 )(s —M -4 6 ) 

with eigenvalues 入 i = 1 and 入 2 = 16 and normalized eigenvectors 

|ei>= ^6) ^ |e2> ^ ^ (-1) - 

The projection operators are 

Pi = \ei) ( fi ll = ^ 4) » P2 = 1^2) ^ (^9 

Thus, we have 

W 4)n(-2 

We note that A is invertible. Thus, R is also invertible, and 



This gives O = AR"" 1 ，or 



It is readily verified that O is indeed orthogonal. 窗 


Our excursion through operator algebra and matrix theory has revealed to us 
the diversity of diagonalizable operators. Could it be perhaps that all operators are 
diagonalizable? In other words, given any operator, can we find a basis in which 
the matrix representing that operator is diagonal? The answer is, in general, no! 
(See Problem 4.27.) Discussion of this topic entails a treatment of the Hamilton- 
Cayly theorem and the Jordan canonical form of a matrix, in which the so-called 
generalized eigenvectors are introduced. A generalized eigenvector belongs to the 
kernel of (A — 入 1) m for some positive integer m. Then X is called a generalized 
eigenvalue. We shall not pursue this matter here. The interested reader can find 
such a discussion in books on linear algebra and matrix theory. We shall, however, 
see the application of this notion to special operators on infinite-dimensional vector 
spaces in Chapter 16. One result is worth mentioning at this point. 


4.7.10, Proposition* If the roots of the characteristic polynomial of a matrix are 
all simple, then the matrix can be brought to diagonal form by a similarity (not 
necessarily a unitary) transformation. 




analytic definition of 
the determinant of a 
matrix 


4.7 REAL VECTOR SPACES 137 


4.7.11, Example. As a final example of the application of the results of this section, let 
us evaluate the n-fold integral 

I n = I dxi I dx 2 ■■- / °° dx n e~^^ m U x i x j\ (4.18) 


*oo poo 
dx\ I dx 2 

-oo J—oo 


-oo 


where the m" are elements of a real, symmetric, positive definite matrix, say M. Because 
it is symmetric, M can be diagonalized by an orthogonal matrix R so that RMR f = D is a 
diagonal matrix whose diagonal entries are the eigenvalues, 入 i ， 入 2 , . • . ，入 n，of M, whose 
positive definiteness ensures that none of these eigenvalues is zero or negative. 

The exponent in (4.18) can be written as 

n 

mijXiXj = x f Mx = x f R f RMR f Rx = x^Dx^ = X\x r ^ + • • • + A. n ^ 2 , 

where 


/x[\ 



x 2 

=Rx = R 


V^n/ 


\Xn/ 


or, in component form, ^ r ij x j f or ’ = 1,2,... Similarly, since x = R r x、it 

follows that Xi = rj(Xj for i = 1 ， 2,… ， n. 

The “volume element” dx\ - - - dx n is related to the primed volume element as follows: 


dx\ - - - dx n 


\d{x\,X 2 ，...,x n ) 


dx r ^ - - - dx f n = I det * - - dx f n 


where J is the Jacobian matrix whose i ,/th element is dxi/dx’y But 
dxi 




m 




R f 


I det JI = I det | = 1. 


Therefore, in terms of x f , the integral l n becomes 

roo poo 


In 


■00 

roo 


dx[ I g … / dx f n e- klx ?~ k2X 2 - 

J—oo J—oo 

dx[e~ k ^ (厂 dx’ 2 e- k2 垮) …(厂 dx f n e~ XnX ^ 


It / 7T 


' 7t 
入 n 


it 


00 

n/2 


Dl … 入 w 


jr n / 2 (detM) _1/2 , 


because the determinant of a matrix is the product of its eigenvalues. This result can be 
written as 

尸 °° d n xe- xtMx = jr rt / 2 (detM) 一 " 2 今 detM= —^~ — ~ r — 


-OO 


(J^d^xe-^Y 


which gives an analytic definition of the determinant. 


m 




138 4. SPECTRAL DECOMPOSITION 


4.8 Problems 

4.1. Let Hi and U 2 be subspaces of V. Show that 

(a) dim(Ui + K 2 ) = dimlCi + dimlX 2 — dim(Ui H U 2 ). Hint: Extend a basis of 
Ui n U 2 to both and U 2 . 

(b) If Ui +U 2 == V and dimlti + dim U 2 = dim V, then V = ICi ㊉ U 2 . 

(b) If dim Iti 4 - dimll 2 > dim V, then Uj D U 2 # {0}. 

4.2. Let P be the (hermitian) projection operator onto a subspace M. Show that 
1 — P projects onto M 丄 . Hint: You need to show that (m| P |a> = {m\a) for 
arbitrary \a) and \m) e M; therefore, consider {m\ P and use the hermiticity 
of P. 

4.3. Show that a subspace M of an inner product space V is invariant under the 
linear operator A if and only if 丄 is invariant under A 、 

4.4. Show that the intersection of two invariant subspaces of an operator is also 
an invariant svtbspace. 

4.5. Let tt be a permutation of the integers {1 ， 2, • • • ， 《}• Find the spectrum of A^, 
if for | 文） =(oq ， o ： 2 , … ， ot n ) e C n , we define 

4.6. Show that 

(a) the coefficient of in the characteristic polynomial is (—1)' where N = 
dimV, and 

(b) the constant in the characteristic polynomial of an operator is its determinant. 

4.7. Operators A and B satisfy the commutation relation [A, B] = 1 ■ Let \b) be an 
eigenvector of B with eigenvalue X. Show that e~ rA \b) is also an eigenvector of 

translation operator B, but with eigenvalue 入 + t. This is why e~ rA is called the translation operator 

for B. Hint: First find [B, e~ rA ], 

4.8. Find the eigenvalues of an involutive operator, that is, an operator A with the 
property A 2 = 1. 

4.9. Assume that A and A' are similar matrices. Show that they have the same 
eigenvalues. 

4.10. In each of the following cases, determine the counterclockwise rotation of 
the jc^-axes that brings the conic section into the standard form and determine the 
conic section. 

⑷ m 2 + 3y 2 + 6 xy -12 = 0 (b) 5x 2 -3y 2 + 6^ + 6 = 0 

(c) 2.x} — — A-xy — 3 = 0 (d) 6x^ + 3y 2 — Axy 一 7 = 0 

(e) 2x 2 + 5y 2 — 4xy — 36 = 0 



4.8 PROBLEMS 139 


4.11. Show that if A is invertible, then the eigenvectors of A — 1 are the same as 
those of A and the eigenvalues of A " 1 are the reciprocals of those of A. 

4.12. Find all eigenvalues and eigenvectors of the following matrices: 



4.13. Show that a 2 x 2 rotation matrix does not have a real eigenvalue (and, 
therefore, eigenvector) when the rotation angle is not an integer multiple of jr. 
What is the physical interpretation of this? 


4.14. Three equal point masses are located at (a, a, 0), (a, 0, a), and (0, a, a). 
Find the moment of inertia matrix as well as its eigenvalues and the corresponding 
eigenvectors. 

4.15. Consider (ai, o^，“ • ， <x n ) € C n and define E" as the operator that inter¬ 
changes ai and a；. Find the eigenvalues of this operator. 

4 . 16 . Find the eigenvalues and eigenvectors of the operator -id/dx acting in the 
vector space of differentiable functions C 1 (— 00 , 00 ). 

4.17. Show that a hermitian operator is positive if and only if its eigenvalues are 
positive. 


4.18. What are the spectral decompositions of At ， A 一 、 and AA^ for an invertible 
normal operator A? 


4.19. Consider the matrix 


A = 




(a) Find the eigenvalues and the orthonormal eigenvectors of A. 

(b) Calculate the projection operators (matrices) Pi and P 2 and verify that ^ Pf = 

1 and = A - 

(c) Find the matrices \/A, sin(jrA/6), and cos(jrA/6). 

{d) Is A invertible? If so, find the eigenvalues and eigenvectors of A 一 1 . 


140 4. SPECTRAL DECOMPOSITION 


4.20, Consider the matrix 

A = I —j 4 —i J . 

V 1 i 4 / 


(a) Find the eigenvalues of A. Hint: Try 入 = 3 in the characteristic polynomial of 
A. 

(b) For each k, find a basis for M 久 the eigenspace associated with the eigenvalue 

X. 

(c) Use the Gram-Schmidt process to orthonormalize the above basis vectors. 

(d) Calculate the projection operators (matrices) P/ for each subspace and verify 

that J2i = 1 and = A. 

(e) Find the matrices sin(7rA/2), and cos( ttA/2). 

(/) Is A invertible? If so, find the eigenvalues and eigenvectors of A" 1 . 

4.21. Show that if two hermitian matrices have the same set of eigenvalues, then 
they are unitarily related. 


4,22, Prove that corresponding to every unitary operator U acting on a finite- 
dimensional vector space, there is a hermitian operator H such that U = exp r*H. 


4,23* Find the polar decomposition of the following matrices: 



(IX 0\ 

^a/7 3J 1 


B 


'41 

12i 


-12i\ 

34 J’ 




4,24. Show that an arbitrary matrix A can be “diagonalized” as D = UAV, where U 
is unitary and D is a real diagonal matrix with only nonnegativc eigenvalues. Hint: 
Consider AA^. 

4.25_ Show that (a) if X is an eigenvalue of an antisymmetric operator, then so 
is -k, and (b) antisymmetric operators (matrices) of odd dimension cannot be 
invertible. 


4.26. Find the unitary matrices that diagonalize the following hermitian matrices: 

a 2 = (4 l 3 ), a 3 = () -*), 

B 2 =(o -1 i). 

\-i i 0 / 

Warning! You may have to resort to numerical approximations for some of these. 

4.27. For A = (q i)> where x ^0, show that it is impossible to find an invertible 

2x2 matrix R such that RAR 一 1 is diagonal. (This shows that not all operators are 
diagonalizable.) 




4.8 PROBLEMS 141 


Additional Reading 

1. Axler, S. Linear Algebra Done Right, Springer-Verlag ， 1996. Concise but 
useful discussion of real and complex spectral theory. 

2. DeWo, C. Functional Analysis and Linear Operator Theory, Addison- 
Wesley, 1990. Has a good discussion of spectral theory for finite and infinite 
dimensions. 

3. Halmos, P. Finite Dimensional Vector Spaces, 2nd ed., Van Nostrand, 1958. 
Comprehensive treatment of real and complex spectral theory for operators 
on finite dimensional vector spaces. 




Part II 


Infinite-Dimensional Vector 
Spaces 




Hilbert Spaces 


The basic concepts of finite-dimensional vector spaces introduced in Chapter 1 can 
readily be generalized to infinite dimensions. The definition of a vector space and 
concepts of linear combination, linear independence, basis ， subspace ， span, and so 
forth all carry over to infinite dimensions. However, one thing is crucially different 
in the new situation, and this difference makes the study of infinite-dimensional 
vector spaces both richer and more nontrivial: In a finite-dimensional vector space 
we dealt with finite sums; in infinite dimensions we encounter infinite sums. Thus, 
we have to investigate the convergence of such sums. 


5.1 The Question of Convergence 


The intuitive notion of convergence acquired in calculus makes use of the idea of 
closeness. This, in turn, requires the notion of distance. 1 We considered such a 
notion in Chapter 1 in the context of a norm, and saw that the inner product had 
an associated norm. However, it is possible to introduce a norm on a vector space 
without an inner product. 

One such norm, applicable to C /z and JR' was 



where p is an integer. The “natural” norm, i.e.，that induced on C n (or R rt ) by 
the usual inner product, corresponds to /? = 2. The distance between two points 


! It is possible to introduce the idea of closeness abstractly, without resort to the notion of distance，as is done in topology. 
However, distance, as applied in vector spaces，is as abstract as we want to get. 


146 5. HILBERT SPACES 


depends on the particular norm used. For example, consider the “point”（or vector) 
\b) = (0.1 ， 0.1, ••• ， 0.1) in a 1000-dimensional space (n = 1000). One can easily 
check that the distance of this vector from the origin varies considerably with p\ 
Closeness is a ||&|[i = 100, \\b \\2 = 3.16, ||/?||io = 0.2. This variation may give the impression 

relative concept! that there is no such thing as “closeness”，and it all depends on how one defines the 

norm. This is not true, because closeness is a relative concept: One always compares 
distances. A norm with large p shrinks all distances of a space, and a norm with 
small p stretches them. Thus, although it is impossible (and meaningless) to say 
that is close to | 办 >” because of the dependence of distance on p, one can 
always say “|a> is closer to \b) than \c) is to \d)T regardless of the value of p. 

Now that we have a way of telling whether vectors are close together or far 
apart，we can talk about limits and the convergence of sequences of vectors. Let 
us begin by recalling the definition of a Cauchy sequence 


Cauchy sequence 
defined 


5.1.1. Definition. An infinite sequence of vectors {1^〉}=^ m a normed linear 

space V is called a Cauchy sequence if Iimj_>oo II 屮一 ay || = 0. 

j->oo 


A convergent sequence is necessarily Cauchy. This can be shown using the 
triangle inequality (see Problem 5.2). However, there may be Cauchy sequences 
in a given vector space that do not converge to any vector in that space (see the 
example below). Such a convergence requires additional properties of a vector 
space summarized in the following definition. 

complete vector 5.1.2. Definition. A complete vector space V is a normed linear space for which 
space defined every Cauchy sequence of vectors in V has a limit vector in V. In other words, 
if {l^*)}/2：i ^ a Cauchy sequence，then there exists a vector \a) € V such that 
lim^oo \\a t —a|| =0. 

5*1.3, Example. 1. R is complete with respect to the absolute-value norm ||a || = |a|. In 
other words，every Cauchy sequence of real numbers has a limit in M. This is proved in real 
analysis. 

2. € is complete with respect to the norm ||a[| = |a| = (Rea) 2 + (Ima) 2 . Using 
|of[ < I Real + [Ima|, one can show that the completeness of C follows from that of E. 
Details are left as an exercise for the reader. 

3. The set of rational numbers Q is not complete with respect to the absolute-value norm. 

In fact, {(1 + is a sequence of rational numbers that is Cauchy but does not 

converge to a rational number; it converges to e 9 the base of the natural logarithm, which is 
known to be an irrational number. M 


Let be a Cauchy sequence of vectors in a finite-dimensional vec¬ 

tor space Choose an orthonormal basis {1^}}^ in such that 2 \at )= 


2 Recali that one can always define an inner product on a finite-dimensional vector space. So, the existence of orthonormal 
bases is guaranteed. 





5.1 THE QUESTION OF CONVERGENCE 147 


all finite-dimensional 
vector spaces are 
complete 


and M = Ef=i ^i J) 1^). Then 

N . 

|| 叫一 aj \\ 2 = ( 卬 一 aj\ai -aj) == — a[ j) ) \e k ) 


- ^ (^ 1) - - aj j) ) {e k \e{) = ^ |^° - o^)| 2 . 

k,l=\ k=l 


The LHS goes to zero, because the sequence is assumed Cauchy. Furthermore, all 
terms on the RHS are positive. Thus, they too must go to zero as i, j oo. By 
the completeness of C ， there must exist a* e C such that lim n —oo 0 ^)= 0 ^ for 
k = l 7 2,, N. Now consider \a) e Vn given by \a) = a k kfc>. We claim 
that \a) is the limit of the above sequence of vectors in Vjv. Indeed, 


N 


N 


.lim ||ai -a|| 2 = lim lim - ak\ 2 





We have proved the following: 

5*1.4. Proposition. Every Cauchy sequence in a finite-dimensional inner product 
space over C(orR)is convergent. In other words，every finite-dimensional complex 
(or real) inner product space is complete with respect to the norm induced by its 
inner product. 

The next example shows how important the word “finite” is. 


5.1.5. Example* Consider {/i}^_p the infinite sequence of continuous functions defined 
in the interval [—1 ， +1] by 


1 if l/k<x<l, 

fkM= (kx-^ 1)/2 if -l/fc<x<l/k, 

0 if -1 <x < -l/L 

This sequence belongs to 1,1), the inner product space of continuous fimctions with 
its usual inner product: (f\g) = f*(x)g(x)dx. It is straightforward to verify that 

\\fk — /；|| 2 = |/jtW — fdx)\^dx - ^ 0, Therefore, the sequence is Cauchy. 

kj^oo 

However* the limit of this sequence is (see Figure 5.1) 


m = 

which is discontinuous at x = 0 and therefore does not belong to the space in which the 
original sequence lies. Q 


1 if 0<x < 1, 

0 if —1 < x < 0, 



148 5. HILBERT SPACES 



Figure 5.1 The limit of the sequence of the continuous functions is a discontinuous 
function that is 1 for jc > 0 and 0 for x < 0. 


We see that infinite-dimensional vector spaces are not generally complete. It is 
. a nontrivial task to show whether or not a given infinite-dimensional vector space 

is complete. 

Any vector space (finite- or infinite-dimensional) contains all finite linear com- 
binations of the form ^ = i a i \ a i) when it contains all the \ai) 9 s. This follows from 
the very definition of a vector space. However, the situation is different when n 
goes to infinity. For the vector space to contain the infinite sum, firstly, the mean¬ 
ing of such a sum has to be clarified, i.e., a norm and an associated convergence 
criterion needs to be put in place. Secondly, the vector space has to be complete 
with respect to that norm. A complete normed vector space is called a Banach 
Banach space space. We shall not deal with a general Banach space, but only with those spaces 
whose norms arise naturally from an inner product. This leads to the following 
definition: 

Hilbert space defined 5.1.6. Definition* A complete inner product space, commonly denoted by IK, is 

called a Hilbert space. 


Thus, all finite-dimensional real or complex vector spaces are Hilbert spaces. 
However, when we speak of a Hilbert space, we shall usually assume that it is 
infinite-dimensional. 

It is convenient to use orthonormal vectors in studying Hilbert spaces. So, let 
us consider an infinite sequence {ki)}^：i of orthonormal vectors all belonging to 
a Hilbert space *K. Next, take any vector \ f) e construct the complex numbers 
fi = {ei\ f), and form the sequence of vectors 3 

n 

\fn) = L 力 k /》 for « = 1,2,... (5.1) 

• t 


3 We can consider \f n ) as an “approximation” to|/), because both share the same components along the same set of orthononnal 
vectors. The sequence of orthononnal vectors acts very much as a basis. However, to a basis, an extra condition must be met. 
We shall discuss this condition shortly. 




5.1 THE QUESTION OF CONVERGENCE 149 


Parseval inequality 


Bessel inequality 


complete 
orthonormal 
sequence of vectors 


For the pair of vectors [/) and |/„>, the Schwarz inequality gives 

I 力 I 2 ), (5.2) 

where Equation (5.1) has been used to evaluate {/„| /„).Onthe other hand, taking 
the inner product of (5.1) with {/| yields 

(f\fn) = i2fi(f\e i ) = ^2 fifi = E 

i=l z=l *=1 

Substitution of this in Equation (5.2) yields the Parseval inequality: 

J2\fi\ 2 <(f\f)- (5.3) 

/=i 

This conclusion is true for arbitrarily large « and can be stated as follows: 

5.1.7* Proposition* Let be an infinite set of orthonormal vectors in a 

Hilbert space, %. Let \ f) e and define complex numbers fi = {ei \ />. Then 
the Bessel inequality holds: \fi\ 2 ^ {/I /). 

The Bessel inequality shows that the vector 

00 n 

^ \ e i) 

i=l n ^°° 1=1 

converges; that is, it has a finite norm. However, the inequality does not say whether 
the vector converges to |/).To make such a statement we need completeness: 

5.1.8. Definition. A sequence of orthonorrml vectors {| 灼 )}^ in a Hilbert space 
is called complete if the only vector in Ji that is orthogonal to all the \ei) is the 

zero vector. 

This completeness property is the extra condition alluded to (in the footnote) 
above, and is what is required to make a basis. 

5.1.9. Proposition. Let be an orthonormal sequence in *K. Then the 

following statements are equivalent: 

L {l^)}^；i is complete. 

2. V I/) e 

3 - \ e i) ( e i \ = ^ 

4. (f\g)=Zr=i{f\ei)(ei\g) V [/), \g) e %. 


\{f\fn)\ 2 <{f\f)(fn\fn) = (f\f) 


150 5. HILBERT SPACES 


Parseva! equality; 
generalized Fourier 
coefficients 


completeness 

relation 


basis for Hilbert 
spaces 


5. ll/ll 2 = 1(^1/) I 2 V I/) € JC. 

Proof. We shall prove the implications 1=>-2=^3=^4=>5=>1. 

1 ==> 2: It is sufficient to show that the vector \ijf) = \ f) — \ e i) ^ e i \ /) 
orthogonal to all the \ej)\ 

00 f 人、 

{ej\ir) = {ej\ f) - y^(gjki) {e t \ f) = 0. 

*=l 

2^3: Since |/) = 1 |/) =： (^*1) l/> is true for all \ f) e K, we must 

have 1 = \ e i) ( e il 

3 => 4: (f\g) = (f\Ug) = {f\ (E"i \^i) (ei\)\g)- klg>. 

4 5: Let [g) = ]/> in statement 4 and recall that {/I ei) = {^/I /)*. 

5 => 1: Let [/) be orthogonal to all the | 灼 ）. Then all the terms in the sum are 

zero implying that ||/|| 2 = 0, which in turn gives [/) = 0, because only the zero 
vector has a zero norm. 口 

The equality . 

00 oo 

ii/ii 2 = (/i/) = Ei (m /> i 2 = X] 旧 2 ， 力 = mi ，〉， （ 5 * 4 ) 

1=1 i=l 

is called the Parseval equality, and the complex numbers ft are called generalized 
Fourier coefficients. The relation 

oo 

1 = (5.5) 

is called the completeness relation. 

5.1.10. Definition. A complete orthonormal sequence {|e/)}^ ：1 in a Hilbert space 
*K is called a basis of3i. 

5.2 The Space of Square-Integrable Functions 

Chapter 1 showed that the collection of all continuous functions defined on an 
interval [a, b] forms a linear vector space. Example 5.1.5 showed that this space 
is not complete. Can we enlarge this space to make it complete? Since we are 
interested in an inner product as well, and since a natural inner product for func¬ 
tions is defined in terms of integrals, we want to make suie that our functions 
are integrable. However, integrability does not require continuity, it only requires 
piecewise continuity. In this section we shall discuss conditions under which the 



5.2 THE SPACE OF SQUARE-INTEGRABLE FUNCTIONS 151 


space of functions becomes complete. An important class of functions has already 
been mentioned in Chapter 1. These functions satisfy the inner product given by 


< 友 1/> = 



g*(x)f(x)w(x)dx. 


square-integrable 

functions 


If g(x) = /(x), we obtain 

(f\f) = / \f{x)\ 2 w{x)dx. 
Ja 


(5.6) 


Functions for which such an integral is defined are said to be square-integrable. 


David Hilbert (1862-1943), the greatest mathematician of this 
century, received his Ph.D. from the University of Konigsberg 
and was a member of the staff there from 1886 to 1895. In 1895 
he was appointed to the chair of mathematics at the University 
of Gottingen, where he continued to teach for the rest of his life. 

Hilbert is one of that rare breed of late 19th-century math¬ 
ematicians whose spectrum of expertise covered a wide range, 
with formal set theory at one end and mathematical physics at the 
other. He did superb work in geometry, algebraic geometry, alge¬ 
braic number theory, integral equations, and operator theory. The 
seminal two volume book Methoden der mathematische Physik 
by R. Courant, still one of the best books on the subject, was greatly influenced by Hilbert. 

Hilbert’s work in geometry had the greatest influence in that area since Euclid. A system¬ 
atic study of the axioms of Euclidean geometry led Hilbert to propose 21 such axioms, and 
he analyzed their significance. He published Grundlagen der Geometric in 1899, putting 
geometry on a formal axiomatic foundation. His famous 23 Paris problems challenged (and 
still today challenge) mathematicians to solve fundamental questions. 

It was late in his career that Hilbert turned to the subject for which he is most famous 
among physicists. A lecture by Erik Holmgren in 1901 on Fredholm’s work on integral 
equations, which had already been published in Sweden, aroused Hilbert’s interest in the 
subject. David Hilbert, having established himself as the leading mathematician of his time 
by his work on algebraic numbers, algebraic invariants, and the foundations of geometry, 
now turned his attention to Integral equations. He says that an investigation of the subject 
showed him that it was important for the theory of definite integrals, for the development of 
arbitrary functions in series (of special functions or trigonometric functions), for the theory 
of linear differential equations, for potential theory, and for the calculus of variations. He 
wrote a series of six papers from 1904 to 1910 and reproduced them in his book Grundzuge 
einer allgemeinen Theorie der linearen Integralgleichungen (1912). During the latter part 
of this work he applied integral equations to problems of mathematical physics. 

It is said that Hilbert discovered the correct field equation for general relativity in 1915 
(one year before Einstein) using the variational principle, but never claimed priority. 

Hilbert claimed that he worked best out-of-doors. He accordingly attached an 18-foot 
blackboard to his neighbor’s wall and built a covered walkway there so that he could work 
outside in any weather. He would intermittently interrupt his pacing and his blackboard 





152 5. HILBERT SPACES 


computations with a few turns around the rest of the yard on his bicycle, or he would pull 
some weeds, or do some garden trimming. Once, when a visitor called, the maid sent him 
to the backyard and advised that if the master wasn’t readily visible at the blackboard to 
look for him up in one of the trees. 

Highly gifted and highly versatile, David Hilbert radiated over mathematics a catching 
optimism and a stimulating vitality that can only be called “the spirit of Hilbert." Engraved 
on a stone marker set over Hilbert’s grave in Gottingen are the master’s own optimistic 
words: “Wir miissen wissen. Wir werden wissen,”（“We must know. We shall know .’’〉 


The space of square-integrable functions over the interval [a, b] is denoted by 
b)An this notation -C stands for Lebesgue, who generalized the notion of 
the ordinary Riemann integral to cases for which the integrand could be highly 
discontinuous; 2 stands for the power of f(x) in the integral; a and b denote the 
limits of integration; and w refers to the weight function (a strictly positive real¬ 
valued function). When w(x) = 1, we use the notation b). The significance 
of b) lies in the following theorem (for a proof, see [Reed 80, Chapter HI]): 

Z 2 w (a, b) is complete 5.2 丄 Theorem. (Riesz-Fischer theorem) The space b) is complete. 

A complete infinite-dimensional inner product space was earlier defined to be 
a Hilbert space. The following theorem shows that the number of Hilbert spaces 
is severely restricted. (Fora proof, see [Frie 82, p. 216].) 

all Hilbert spaces are 5.2.2. Theorem. AU infinite-dimensional complete inner product spaces are iso- 
alike morphic to ^ (a t b). 

b) is defined in terms of functions that satisfy Equation (5.6). Yet an inner 
product involves integrals of the form g*(x)f(x)w(x) dx ： Are such integrals 
well-defined and finite? Using the Schwarz inequality, which holds for any inner 
product space, finite or infinite, one can show that the integral is defined. The 
isomorphism of Theorem 5.2.2 makes the Hilbert space more tangible, because it 
identifies the space with a space of functions, objects that are more familiar than 
abstract vectors. Nonetheless, a faceless function is very little improvement over 
an abstract vector. What is desirable is a set of concrete functions with which we 
can calculate. The following theorem provides such functions (for a proof, see 
[Simm 83, pp. 154-161]). 

Theorem. (Stone-Weierstrass approximation theorem) The sequence of 
functions (monomials) {x k } } where /: = 0, 1， 2, ■ ■ ■ ， forms a basis of b). 

Thus, any function / can be written as f(x) = J^kLo a kx k . Note that the 
{x^} are not orthonormal but are linearly independent. If we wish to obtain an 
orthonormal — or simply orthogonal — linear combination of these vectors, we can 
use the Gram-Schmidt process. The result will be certain polynomials, denoted by 
C n (x )， that are orthogonal to one another and span C^(a 9 b). 


5.2 THE SPACE OF SQUARE-INTEGRABLE FUNCTIONS 153 


Such orthogonal polynomials satisfy very useM recurrence relations, which 
we now derive. In the following discussion p<ki^) denotes a generic polynomial 
of degree less than or equal to k. For example, 3x 5 — 4x 2 + 5,2 文 + 1 ， —2.4x 4 + 
3x 3 一 : c 2 + 6, and 2 are all denoted by p< 5 (x) or p<^(x) or p< 59 (x) because they 
all have degrees less than or equal to 5, 8, and 59. Since a polynomial of degree 
less than n can be written as a linear combination of Q(x) with k < n, we have 
the obvious property 


C n {x)p sn -\{x)w{x) dx = 0. 


(5.7) 


Letk m mdk m denote, respectively, the coefficients of x m and x m ~ l in C m (x), 
and let 


[C m (x)] 2 w(x)dx. 


(5.8) 


The polynomial C n +i(x) — (k n ^\/k n )xC n {x) has degree less than or equal to n, 
and therefore can be expanded as a linear combination of the (x): 


K+\ 


C„ + i(x) - ^~xC n (x) = y^ajCj(x). 


(5.9) 


Take the inner product of both sides of this equation with C m {x): 

I C n +i(x)C m (x)w(x)dx - j — / xC n (x)C m (x)w(x)dx 

Ja J a 


ru 

/ Cj(x)C m (x)w(x)dx, 


The first integral on the LHS vanishes as long as m < n; the second integral 
vanishes if m < n —2 [if m < n — 2 9 then xC m (x) is a polynomial of degree 
n — 1], Thus, we have 


ru 

j Cj(x)C m (x)w(x)dx = 0 for w < n — 2. 


The integral in the sum is zero unless j = m, by orthogonality. Therefore, the sum 
reduces to 

pb 

a m I [C m (x)] 2 w(x) dx =0 for m <n —2. 


Since the integral is nonzero, we conclude that a m = Oforw = 0,1,2,—2, 
and Equation (5.9) reduces to 


Cn+iCO r xC n (x) = a n —\C n —\{x) + d n C n (x). 
% 


(5.10) 




154 5. HILBERT SPACES 


It can be shown that if we define 

— 

1 °^n—l 

a recurrence relation then Equation (5.10) can be expressed as 
for orthogonal 

polynomials C n ^(x) = (a n x + p n )C n (x) + y n C n -i{x) y 


(X n 


纪 n+1 

I ， 




W+l 


灸 rt+I k n 


Yn 


(5.11) 


(5,12) 


or 






(5.13) 


Other recurrence relations, involving higher powers of x, can be obtained from 
the one above. For example, a recurrence relation involving x 2 can be obtained 
by multiplying both sides of Equation (5.13) byjc and expanding each term of the 
RHS using that same equation. The result will be 


X 2 C n (x) 


一卜 +1 


C n+2 (x) 


( 


&+i 


\a n a n+1 

, Yn 
\a n a n ^.i al <x n otn-l 

( PnYn ^ Pn—lYn 

^T 


Pn\ 


( 叉） 


C n (x) 


C n _i(jc) + ^^-C w _ 2 (x). 


(5.14) 


5.2.4. Example. As an application of the recurrence relations above, let us evaluate 


h = I xC m (x)C n (x)w(x) dx. 


Substituting (5.13) in the integral gives 

h = i f C m (x)C n+ i(x)w(x)dx - — f C m ⑻ C/iOOwW 办 
a« Ja J a 

、b 

C m (x)C n ^i (x)w(x)dx. 

«« Ja 


Yn 


We now use the orthogonality relations among the C^ix) to obtain 


h = ^m ， n+l J C^(x)w(x)dx-^-S mn J C} n {x)w(x)dx 


Yn 

an 


、b 


Cl l {x)w{x)dx 


(士‘ +1 


Pm s Vm+1 n \ T 




5-2 THE SPACE OF SQUARE-INTEGRABLE FUNCTIONS 


or 

if m = n + 1, 

, if m = «， 

-Ym+lhm/a m+ i if m = n-l f ■ 

0 otherwise. 

5.2.5. Example. Let us find the orthogonal polynomials forming a basis of 厶 2 (— 1 ， + 1 )， 
which we denote by Pk(x), where k is the degree of the polynomial. Let 尸 oOO = 1. To find 
尸 1 CO, write Pi (x) = ax + fc，and determine a and in such a way that Pi (a:) is orthogonal 
to PqW* 

。 =J ^ P\ ( 文 ) 尸 o W dx = j (ax +b)dx = ^ax 2 \^ -\-2b = 2b. 

So one of the coefficients, b, is zero. To find the other one、we need some standardization 
procedure. We “standardize” P^x) by requiring that P^(l) = 1 V/:. For A: = 1 this yields 
a x 1 = 1, ora = 1, so that Pi(x) = x. 

We can calculate 巧 W similarly: Write P 2 (^) = ax 1 + 办 jc + c，impose the condition 
that it be orthogonal to both PiCt) and PqW* enforce the standardization procedure. 
All this will yield 

/ I 2 /*! 7 

p 2 (x)Po(x)dx = -a + 2c, 0= I P 2 {x)P\{x)dx = -b, 

"1 ^ J — i 3 

and 尸 2 (l) = a+/?+c = 1, These three equations have the unique solution a = 3/2,b = 0, 
c = —1/2. Thus, P 2 M = 士 (3 尤 2 — 1). These are the first three Legendre polynomials, 
which are part of a larger group of polynomials to be discussed in Chapter 7. 顧 


5.2.1 Orthogonal Polynomials and Least Squares 

The method of least squares is no doubt familiar to the reader. In the simplest 
procedure, one tries to find a linear function that most closely fits a set of data. 
By definition, “most closely” means that the sum of the squares of the differences 
between the data points and the corresponding values of the linear function is 
minimum. More generally, one seeks the best polynomial fit to the data. 

We shall consider a related topic, namely least-square fitting of a givenfimction 
with polynomials. Suppose f(x) is a function defined on (a, b). We want to find a 
polynomial that most closely approximates /. Write such a polynomial as p{x )= 
ZHo a k^ k j where the afs are to be determined such that • 

ai,..., a„) = I [/(x) -ao~ a\x - a n x n ] 2 dx 

Ja 

is a minimum. Differentiating S with respect to the flfc’s and setting the result equal 
to zero gives 




158 5. HILBERT SPACES 


⑻ |j/ 士黑 || = ||/||+ ■ 

(b) 11/+ gll 2 +11/ - 友 11 2 = + _ 2 . 

⑹ Using parts (a) s (b), and Theorem 1.2.8, show that £^(10 is not an inner 
product space. This shows that not all norms arise from an inner product. 

5.6. Use Equation (5.10) to derive Equation (5.12). Hint: To find a„, equate the 
coefficients of x n on both sides of Equation (5.10). To find a n -\, multiply both 
sides of Equation (5,10) by C n ^\w(x) and integrate, using the definitions of k n , 
k fV and/i„. 

5.7. Evaluate the integral x 2 C m (x)C„(x)w(x)dx. 

Additional Reading 

1. Boccara, N. Functional Analysis, Academic Press, 1990. An application 
oriented book with many abstract topics related to Hilbert spaces (e.g., 
Lebesgue measure) explained for a physics audience. 

2. DeVito, C. Functional Analysis and Linear Operator Theory ， Addison- 
Wesley, 1990. 

3. Reed, M., and Simon, B. Functional Analysis, Academic Press, 1980, Coau¬ 
thored by a mathematical physicist (B.S.), this first volume of a four-volume 
encyclopedic treatise on functional analysis and Hilbert spaces has many 
examples and problems to help the reader comprehend the rather abstract 
presentation. 

4. Zeidler, E. Applied Functional Analysis, Springer-Vcrlag, 1995. Another 
application-oriented book on Hilbert spaces suitable fora physics audience. 



Generalized Functions 


Once we allow the number of dimensions to be infinite, we open the door for 
numerous possibilities that are not present in the finite case. One such possibility 
arises because of the variety of infinities. We have encountered two types of infinity 
in Chapter 0, the countable infinity and the uncountable infinity. The paradigm of 
the former is the “number” of integers, and that of the latter is the “munber” of 
real numbers. The nature of dimensionality of the vector space is reflected in the 
components of a general vector, which has a finite number of components in a 
finite-dimensional vector space, a countably infinite number of components in 
an infinite-dimensional vector space with a countable basis, and an uncountably 
infinite number of components in an infinite-dimensional vector space with no 
countable basis. 

6.1 Continuous Index 

To gain an understanding of the nature of, and differences between, the three 
types of vector spaces mentioned above, it is convenient to think of components 
as functions of a “counting set •” Thus, the components fi of a vector |/> in an 
TV-dimensional vector space can be thought of as values of a function / defined 
on the finite set {1, 2,, ^V}, and to emphasize such functional dependence, we 
write /(0 instead of ft ， Similarly, the components fi of a vector |/) in a Hilbert 
space with the countable basis B = can be thought of as values of a 

function / : N — C，where N is the (infinite) set of natural numbers. The next 
step is to allow the counting set to be uncountable, i.e” a continuum such as the 
real numbers or an interval thereof. This leads to a “component” of the form f(x) 
corresponding to a function f : R -> C. What about the vectors themselves? 
What sort of a basis gives rise to such components? 


160 6. GENERALIZED FUNCTIONS 


completeness 
relation fora 
continuous index 


Dirac delta function 


Because of the isomorphism of Theorem 5.2.2, we shall concentrate on 
b). In keeping with our earlier notation, let {|^)}x€R denote the elements 
of an orthonormal set and interpret f(x) as (e x \ /}. The inner product of C^(a, b) 
can now be written as 


(g\f) = / g*(x)f(x)w(x)dx 


(g\e x ) (e x \f) w(x)dx 


\e x )w(x) (e x \ dxj |/>. 

The last line suggests writing 

f \e x ) {e x \ dx = 1. 

Ja 

In the physics literature the is ignored, and one writes \x) for \e x ). Hence, we 
obtain the completeness relation foi ■狂 continuous index: 

f b rb 

|x) w;(jc) (x| dx — 1 , or / |x) {x\ dx = 


(6.1) 


( 6 . 2 ) 


where in the second integral, iy (x) is set equal to unity. We also have 

I/) = \x) w(x) (x\ dx^j \ f) = J f{x)w{x) \x) dx, 

which shows how to expand a vector |/> in terms of the [x)^. 

Take the inner product of (6.2) with U’|: 

r b 

{x f \f) = = / f(x)w(x){x f \x) dx 


where y is assumed to lie in the interval {a, b), otherwise f(x f ) = 0 by definition. 
This equation, which holds for arbitrary /, tells us immediately that w{x) {y|x> 
is no ordinary function of x and x f . For instance, suppose f(x f ) = 0. Then, the 
result of integration is always zero, regardless of the behavior of / at other points. 
Clearly, there is an infinitude of functions that vanish at yet all of them give the 
same integral! Pursuing this line of argument more quantitatively, one can show 
thatu)(x) (x f \x) = 0\fx ^ x l , w{x) (x| jc 〉= oo,w(^:) (x r |x) is an even function 
ofx-x\ and w(x) (x f \x) dx =1. The proof is left as a problem. The reader 
may recognize this as the Dirac delta function 

5(a; — x f ) = w{x) {^1 x ), (6.3) 

which, for a function / defined on the interval {a, b) 9 has the following property: 1 


f (x)S(x 


-x f )dx=\ fiX，) ㈣ 




if x ^ (fl, b). 


(6.4) 


^For an elementary discussion of the Dirac delta function with many examples of its application, see [Hass 99]. 


6.1 CONTINUOUS INDEX 161 



Figure 6.1 The Gaussian bell-shaped curve approaches the Dirac delta function as the 
width of the curve approaches zero. The value of € is 1 for the dashed curve, 0.25 for the 
heavy curve and 0.05 for the light curve. 


Written in the form {x f \x) = 5(x — a/)/m;(a:) ，Equation (6.3) is the generalization 
of the orthonormality relation of vectors to the case of a continuous index. 

The Dirac delta function is anything but a “function.” Nevertheless, there is 
a well-developed branch of mathematics, called generalized function theory or 
functional analysis, studying it and many other functions like it in a highly rigorous 
fashion. We shall only briefly explore this territory of mathematics in the next 
section. At this point we simply mention the fact that the Dirac delta function 
can be represented as the limit of certain sequences of ordinary functions. The 
following three examples illustrate some of these representations. 

6.1.1. Example. Consider a Gaussian curve whose width approaches zero at the same 
time that its height approaches infinity in such a way that its area remains constant. 
In the infinite limit, we obtain the Dirac delta function. In fact, we have 8(x _ x f )= 

lim 6 ^.o —j=e~^ x ~ x ^ _ In the limit of 6 0, the height of this Gaussian goes to infin- 

Y €TC 

ity while its width goes to zero (see Figure 6.1). Furthermore, for any nonzero value of e, 
we can easily verify that 

「 ~e~^ x - xf)2 ^dx = l. 

J—OO ^/€7t 

This relation is independent of € and therefore still holds in the limit 6^0. The limit of 
the Gaussian behaves like the Dirac delta function. M 

6.1«2. Example. Consider the function Dj (x — x f ) defined as 
D T (x-x f ) = ^ - f T e iix ~ x ， )t dt. 

2n J-T 


GENERALIZED FUNCTIONS 



Figure 6.2 The function sin Tx/x also approaches the Dirac delta function as the width 
of the curve approaches zero. The value of T is 0.5 for the dashed curve, 2 for the heavy 
curve, and 15 for the light curve. 


The integral is easily evaluated, with the result 

1 T 1 sinr(x-^) 

D T (x-x f )^— - ——-- —• 

2 tT l(x — x r ) 一 T X — x r 

The graph of Dj(x 一 0) as a function of x for various values of T is shown in Figure 6.2. 
Note that the width of the curve decreases as T increases. The area under the curve can be 
calculated: 


Dj (x — x f ) dx 


00 smT(x-x f ) 


x — x r 


1 厂 siny 
^ J-oo y 


dy = 1. 


Figure 6.2 shows that Dj{x — x f ) becomes more and more like the Dirac delta function 
as T gets larger and larger. In fact, we have 


1 sinJ^ -xO 
—x)= lim - - - • 

r-^OO 7t X — X’ 

To see this, we note that for any finite T we can write 

7T T{X —X f ) 

Furthermore, for values of x that are very close to x\ 


(6.5) 


T(x — 分 ） — 0 


sin T{x — x f ) 
T{x — x r ) 


Thus, for such values of x and x\ we have Dt(x — x f ) ^ (T/jt), which is large when T 
is large. This is as expected of a delta function: 5(0) = oo. On the other hand, the width 





6.1 CONTINUOUS INDEX 163 


step function or e 
function 


5 function as 
derivative of e 
function 


of Dj{x — x f ) around x f is given, roughly, by the distance between the points at which 
Df (x — x f ) drops to zero: T(x — x f ) = 土 tt ，or x — x f = ±7t/T. This width is roughly 
Ax = 2 ti/T, which goes to zero as T grows. Again, this is as expected of the delta function. 


The preceding example suggests another representation of the Dirac delta func¬ 


tion: 


'OO 


S(x-x f ) = — / e i{x ~ x，)t dt. 
2?r J—qo 


( 6 . 6 ) 


6.1.3. Example. A third representation of the Dirac delta function involves the step 
function 0{x - x f ), which is defined as 


Q(x — x f ) 


0 if x < x\ 
1 if x > x f 


and is discontinuous atx = x f . We can approximate this step function by many continuous 
functions, such as T € (x — x l ) defined by 


T € (x^x f ) 


0 


if x <x f -€, 


—(x-^ + 6) if x f <X <x f 

1 if X > y+6. 


where € is a small positive number as shown in Figure 6.3. It is clear that 
G(x — x f ) = lim T € (x — x f ). 

Now let us consider the derivative of (j: — x f ) with respect to x: 

0 if x < x f — €, 

i 

if x f — € < X < x f -b € t 


dT € 


dx 




2e 

0 if x>x f + €. 


We note that the derivative is not defined at = V — e and x = 又 ’ + €, and that dT € /dx 
is zero everywhere except whenx lies in the interval {x r — €,x r e), where it is equal to 
1 /(l€) and goes to infinity as e —* 0. Here again we see signs of the delta function. In fact, 
we also note that 


>oc (dle\ 


.oc dX 


C(f)- 


px r -\-t 

Jx f -€ 


dT € 


2e 


■dx = 1. 


It is not surprising ， then, to find that lim 6 —o —： — (;c —x f ) = 5(x — x r ). Assuming that the 
• . dx 

interchange of the order of differentiation and the limiting process is justified，we obtain 
the important identity 


~——Q{x — x f ) = 8(x 一 文 ’)_ 
dx 


(6.7) 


164 6. GENERALIZED FUNCTIONS 



Figure 6.3 The step function, or ^-function, shown in the figure has the Dirac delta 
function as its derivative. 

Now that we have some understanding of one continuous index, we can gener¬ 
alize the results to several continuous indices. In the earlier discussion we looked at 
f{x) as thejcth component of some abstract vector [/>. For functions of n variables, 
we can think of f(x \,..., x n ) as the component of an abstract vector |/> along a 
basis vector \x\, .. t ,x n )? This basis is a direct generalization of one continuous 
index to n. Then /(jci, ...» x n ) is defined as /(xi ， • •. ， x n ) = (xi, ... f x n \ /). If 
the region of integration is denoted by Q, and we use the abbreviations 

r ^ (x\ 9 X 2 , ... 9 x n ), d n x = dx\dx 2 . ..dx n , 

\x\,X 2 y = |r), 8(x\ -x[)...8(x n - x f n ) = S(r -r^, 

then we can write 

l/)= f d n xf(r)w(r) |r), f d n x\r) w(r) (r[ = 1, 

Jq Jq 

f(r f ) = f d n xf(r)w(r) <r’| r> ， <r'| = 5(r — r ’)， (6.8) 

Jq 

where d n x is the “volume” element and Q is the region of integration of interest. 

For instance，if the region of definition of the functions under consideration is 
the surface of the unit sphere, then [with ui(r) = 1], one gets 

/•2tT p7T 

/ d(j> / sinOdO \0,(p) {^,0| = 1. (6.9) 

Jo Jo 


2 Do not confuse this with an «-dimen§ional vector. In fact，the dimension is n-fold infinite: each xi counts one infinite set of 
numbers! 




6.2 GENERALIZED FUNCTIONS 165 


This will be used in our discussion of spherical harmonics in Chapter 12. 

An important identity using the three-dimensional Dirac delta function comes 
from potential theory. This is (see [Hass 99] for a discussion of this equation) 


▽ 2 t^) = ， r - A 


( 6 . 10 ) 


6.2 Generalized Functions 


Paul Adrian Maurice Dirac discovered the delta function in the late 1920s while 
investigating scattering problems in quantum mechanics. This “function” seemed 
to violate most properties of other functions known to mathematicians at the time. 
Furthermore, the derivative of the delta function, S ; (x — x f ) is such that for any 
ordinary function f(x). 



f(x)8\x — x f )dx = — 



f’(x 、 S(x - x’）dx = —f\x r ). 


We can define S f (x —x f ) by this relation. In addition, we can define the derivative 
of any function, including discontinuous functions, at any point (including points 
of discontinuity, where the usual definition of derivative fails) by this relation. That 
is, if 炉 (j) is a “bad” function whose derivative is not defined at some point(s), and 
/ (x) is a “good” function, we can define the derivative of <p(x) by 



f(x)(p\x) dx = - 



f f (x)(p[x)dx. 


The integral on the RHS is well-defined. 

Functions such as the Dirac delta function and its derivatives of all orders are not 
functions in the traditional sense. What is common among all of them is that in most 
applications they appear inside an integral, and we saw in Chapter 1 that integration 
can be considered as a linear functional on the space of continuous functions. It 
is therefore natural to describe such functions in terms of linear functionals. This 
idea was picked up by Laurent Schwartz in the 1950s who developed it into a new 
branch of mathematics called generalized functions, or distributions. 

A distribution is a mathematical entity that appears inside an integral in con¬ 
junction with a well-behaved test function — which we assume to depend on n 
variables — such that the result of integration is a well-defined number. Depending 
on the type of test function used, different kinds of distributions can be defined. 
If we want to include the Dirac delta function and its derivatives of all orders, 
then the test functions must be infinitely differentiable, that is, they must be 6°° 
functions on (or C M ). Moreover, in order for the theoty of distributions to be 
mathematically feasible, all the test functions must vanish outside a finite “volume” 
of (or C n ). 3 One common notation for such functions is C 尹 (R n ) or Gf{C n ) 

3 Such functions are said to be of compact support. 


166 6. GENERALIZED FUNCTIONS 


generalized functions 
and distributions 
defined 


(F stands for “finite”). The definitive property of distributions concerns the way 
they combine with test functions to give a number. The test functions used clearly 
form a vector space over R or C. In this vector-space language, distributions are 
linear functionals. The linearity is a simple consequence of the properties of the 
integral. We therefore have the following definition of a distribution. 


6.2.1. Definition. A distribution, or generalizedfunction, is a continuous linear 

functional on the space e^(R n ) or S^(C n ), If f e and cp is a distribution, 
then <p[f] = ^(r)/(r) d n x. 

Another notation used in place of (p[f] is {<p, />. This is more appealing not 
only because (p is linear, in the sense that (p[af + fig] = oc(p[f] H- fi<p[g] 7 but 
also because the set of all such linear functionals forms a vector space; that is, 
the linear combination of the ip’s is also defined. Thus, ((p, f) suggests a mutual 
“democracy” for both /’s and 沪 ’s. 

We now have a shorthand way of writing integrals. For instance, if 8 a repre¬ 
sents the Dirac delta function 8(x - a), with an integration oven understood, 
then (8 ai f) = f(a). Similarly, (8 a , f) = -f(a), and for linear combinations, 

6.2.2. Example* An ordinary (continuous) function g can be thought of as a special case 

of a distribution. The linear functional g : e^(E) M is simply defined by (g, f) s 
gif] = g(x)f(x)dx. M 

6.2.3. Example* An interesting application of distributions (generalized functions) oc¬ 
curs when the notion of density is generalized to include not only (smooth) volume densities, 
but also point-like, linear, and surface densities. 

A point charge 夺 located at tq can be thought of as having a charge density p(r) = qS(r— 
ro).In the language of linear functionals, we interpret p as a distribution, p : e^(M 3 ) R, 
which for an arbitrary function / gives 


p[/] = (p,/)=^/(r 0 ). 


( 6 . 11 ) 


The delta function character of p can be detected from this equation by recalling that the 
LHSis 


J p(r)f(r)d 3 x = ^p(r/)/(r r )A^. 

Avt^O i=1 


On the RHS of this equation, the only volume element that contributes is the one that 
contains the point ro ； all the rest contribute zero. As AV( 0, the only way that the 
RHS can give a nonzero number is for p(ro)/( r o) to ^ e infinite. Since / is a well-behaved 
function, p(ro) must be infinite, implying that p(r) acts as a delta function. This shows 
that the definition of Equation (6.11) leads to a delta-function behavior for p. Similarly for 
linear and surface densities. 搦 


4 See [Zeidler, 95], pp. 27.156-160, for a formal definition of the continuity of linear functionals. 


6.2 GENERALIZED FUNCTIONS 167 


The example above and Problems 6.5 and 6.6 suggest that a distribution that 
confines an integral to a lower-dimensional space must have a delta function in its 
definition. 


“The amount of 
theoretical ground 
one has to cover 
before being able to 
solve problems of 
real practical value is 
rather large, but this 
circumstance is an 
inevitable 
consequence of the 
fundamental part 
played by 
transformation 
theory and is likely to 
become more 
pronounced in the 
theoretical physics of 
the future." 
PAM. Dirac (1930) 


“Physical Laws should have mathematical beauty.” This statement 
was Dirac’s response to the question of his philosophy of physics, 
posed to him in Moscow in 1955. He wrote it on a blackboard that 
is still preserved today. 

Paul Adrien Maurice Dirac (1902-1984), was bom in 1902 
in Bristol, England, of a Swiss, French-speaking father and an 
English mother. His father, a taciturn man who refused to receive 
friends at home, enforced young Paul’s silence by requiring that 
only French be spoken at the dinner table. Perhaps this explains 
Dirac’s later disinclination toward collaboration and his general 
tendency to be a loner in most aspects of his life. The fundamental 
nature of his work made the involvement of students difficult, so perhaps Dirac’s personality 
was well-suited to his extraordinary accomplishments. 

Dirac went to Merchant Venturer’s School，the public school where his father taught 
French, and while there displayed great mathematical abilities. Upon graduation, he fol¬ 
lowed in his older brother’s footsteps and went to Bristol University to study electrical 
engineering. He was 19 when he graduated Bristol University in 1921. Unable to find a 
suitable engineering position due to the economic recession that gripped post-World War I 
England, Dirac accepted a fellowship to study mathematics at Bristol University. This fel¬ 
lowship, together with a grant from the Department of Scientific and Industrial Research, 
made it possible for Dirac to go to Cambridge as a research student in 1923. At Cambridge 
Dirac was exposed to the experimental activities of the Cavendish Laboratory，and he be¬ 
came a member of the intellectual circle over which Rutherford and Fowler presided. He 
took his Ph.D. in 1926 and was elected in 1927 as a fellow. His appointment as university 
lecturer came in 1929. He assumed the Lucasian professorship following Joseph Larmor 
in 1932 and retired from it in 1969. Two years later he accepted a position at Florida State 
University where he lived out his remaining years. The FSU library now carries his name. 

In the late 1920s the relentless march of ideas and discoveries had carried physics to a 
generally accepted relativistic theory of the electron. Dirac, however, was dissatisfied with 
the prevailing ideas and, somewhat in isolation, sought for a better formulation. By 1928 
he succeeded in finding an equation, the Dirac equation, that accorded with his own ideas 
and also fit most of the established principles of the time. Ultimately, this equation, and 
the physical theory behind it, proved to be one of the great intellectual achievements of the 
period. It was particularly remarkable for the internal beauty of its mathematical structure, 
which not only clarified previously mysterious phenomena such as spin and the Fermi- 
Dirac statistics associated with it, but also predicted the existence of an electron-like particle 
of negative energy, the antielectron, or positron, and, more recently, it has come to play a 
role of great importance in modem mathematics, particularly in the interrelations between 
topology, geometry, and analysis. Heisenberg characterized the discovery of antimatter by 
Dirac as “the most decisive discovery in connection with the properties or the nature of 
elementary particles ■.. • This discoveiy of particles and antiparticles by Dirac... changed 
our whole outlook on atomic physics completely.” One of the interesting implications of 





168 6. GENERALIZED FUNCTIONS 


his work that predicted the positron was the prediction of a magnetic monopole. Dirac won 
the Nobel Prize in 1933 for this work. 

Dirac is not only one of the chief authors of quantum mechanics, but he is also the 
creator of quantum electrodynamics and one of the principal architects of quantum field 
theory. While studying the scattering theory of quantum particles, he invented the (Dirac) 
delta junction', in his attempt at quantizing the general theory of relativity, he founded 
constrained Hamiltonian dynamics, which is one of the most active areas of theoretical 
physics research today. One of his greatest contributions is the invention of bra (\ and ket 

I). 。 

While at Cambridge, Dirac did not accept many research students. Those who worked 
with him generally thought that he was a good supervisor, but one who did not spend 
much time with his students. A student needed to be extremely independent to work under 
Dirac. One such student was Dennis Sciama, who later became the supervisor of Stephen 
Hawking, the current holder of the Lucasian chair. Salam and Wigner, in their preface to the 
Festschrift that honors Dirac on his seventieth birthday and commemorates his contributions 
to quantum mechanics succinctly assessed the man: 

Dirac is one of the chief creators of quantum mechanics ■ • ■ • Posterity will 
rate Dirac as one of the greatest physicists of all time. The present generation 
values him as one of its greatest teachers ... . On those privileged to know 
him’ Dirac has left his mark ... by his human greatness. He is modest, 
affectionate, and sets the highest possible standards of personal and scientific 
integrity. He is a legend in his own lifetime and rightly so. 

(Taken from Schweber, S. S. “Some chapters for a history of quantum field theory: 1938- 
1952”， in Relativity, Groups, and Topology II vol. 2, B. S. DeWitt and R. Stora, eds., 
North-Holland, Amsterdam, 1984.) 


We have seen that the delta function can be thought of as the limit of an ordinary 
function. This idea can be generalized. 

6.2.4. Definition. Let {^„(x)} be a sequence of functions such that 


_oo 


lim 

n-^oo 


(p n (x)f{x)dx 


■oo 


exists for all / € Then the sequence is said to converge to the distribution 

(p t defined by 


• 00 


/) 


lim 

n^oo 


(pn(x)f(x)dx V/. 


—00 


This convergence is denoted by <p n 
For example, it can be verified that 


n e~^ 2 


sfn 


m 


and 


1 — cosnx 


nnx 


2 


m 


and so on. The proofs are left as exercises. 




6.3 PROBLEMS 1 的 


derivative of a 
distribution 


6-2.5. Definition. The derivative of a distribution (p is another distribution (p ! 
defined by {<p\ f} = - (<p, f) Vf e Gf. 

6.2.6. Example. We can combine the last two definitions to show that if the functions 0 n 
are defined as 


0 n (x)= 


0 if x < 

(nx + 1)/2 if - ^ <x < 
1 if x> 




th^nG^x) ->■ 5( j :). 

We write the definition of the derivative，(%, /) = - {0 n , /’>， in terms of integrals: 
一 d ± 


O n (x)f{x)dx = - f O n (.x)-j-dx = - I O n (x)df 
OO J —00 J—oo 

a -l/n pl/n foo \ 

O n (x)df-\- / 6 n (x)df-b / O n (x)df) 

-oo J—l/n Jl/n / 

= "( ( 

n f l ^ n 1 C x i n C°° 

】 L /n xdf -山 J f -f 1/n 


0+ 广 rdf) 

J—l/n 2 J\/n / 


l/» 

Vn 


df 




2 


(/(1 /k) - /(-1 /rt)) - /(oo) + f(i/n). 


For large «， we have 1/n ^ Oand f(±l/n) % /(O). Thus, 

f G n (x 、 f(x)dx & (-/(-) + -/( — ) — -/(O)) + /(0) % / ⑼. 

J —oo Z \n n it n fi / 

The approximation becomes equality in the limit n oo. Thus, 

/•oo , f 

0 n Mf(x)dx = /(O) = 0 O , / 〉 O n ^y8. 


lim 

n->oo 


■oo 


Note that / (oo) = 0 because of the assumption that all functions must vanish outside a 
finite volume. m 


6.3 Problems 

6.1. Write a density function for two point charges q\ and located at r = ri 
and r = r 2 , respectively. 

6.2. Write a density function for four point charges qi =： q, q 2 = —q, qz = q 
and ^4 = —q, located at the comers of a square of side 2 a，lying in the : cy-plane ， 
whose center is at the origin and whose first comer is at (a,<i). 


170 6. GENERALIZED FUNCTIONS 


1 


8(x — xq), where xo is a root of / and x is 


6.3. Show that 5 (/(,)) = . |//(jco)| 
confined to values close to xq. Hint: Make a change of variable to ;y = f(x). 


6.4. Show that 


聯 )) = S[74^d)’ 

where the xjc’s are all the roots of / in the interval on which / is defined. 

6.5. Define the distribution p : e°°(R 3 ) — R by 

(p ， f) = j <x(r)f(r)da(r) t 
s 

where cr(r) is a smooth function on a smooth surface S in M 3 . Show that p(r) is 
zero if r is not on 5 and infinite if r is on 5. 

6.6. Define the distribution p : G°°(R 3 ) — R by 
ip,f) = 

where 入 (r) is a smooth function on a smooth curve C in R' Show that p(r) is 
zero if r is not on C and infinite if r is on C. 


6.7. Express the three-dimensional Dirac delta function as a product of three one¬ 
dimensional delta functions involving the coordinates in 

(a) cylindrical coordinates, 

(b) spherical coordinates, 

(c) general curvilinear coordinates. 

Hint: The Dirac delta function in M 3 satisfies fff 8(r)d 3 x = 1. 

6.8. Show that / 二 8 f (x)f(x)dx = —/’(◦) where 8 f (x) = 

6.9. Evaluate the following integrals: 


(a) 

(c) 



S(x 2 - 5jc + 6)(3x 2 
<5(sinjrx)(|) A dx. 


— lx +2) dx. 


(b) 

(d) 



8(x 2 — 7t 2 )cosxdx. 
8(e~ x2 ) \nxdx. 


Hint: Use the result of Problem 6.4. 

6.10. Consider | 文 | as a generalized function and find its derivative. 


6.3 PROBLEMS 171 


6.11. Let t} e e°°(R n ) be a smooth function on R n , and let ^ be a distribution. 
Show that r](p is also a distribution. What is the natural definition for r}<pl What is 
(r}(p)\ the derivative of r\(pl 

6.12. Show that each of the following sequences of functions approaches 8(x) in 
the sense of Definition 6.2.4. 




( C ) jt 1 + n 2 x 2 


(d) 


sinn;c 

nx 


Hint: Approximate cp n (x) for large n and jc ^ 0, and then evaluate the appropriate 
integral. 

6.13. Show that ^(1 + tanh nx) 0(x) as n ->• oo. 

6.14. Show that^S^x) = — 


Additional Reading 

1. Hassani, S. Mathematical Methods, Springer-Verlag, 2000. An elementary 
treatment of the Dirac delta function with many examples drawn from me¬ 
chanics and electromagnetism. 

2. Rudin, W. Functional Analysis, McGraw-Hill, 1991. Part II of this mathe¬ 
matical but (for those with a strong undergraduate mathematics background) 
very readable book is devoted to the theory of distributions. 

3. Reed, M. and Simon, B. Functional Analysis, Academic Press, 1980. 


Classical Orthogonal Polynomials 


The last example of Chapter 5 discussed only one of the many types of the so-called 
classical orthogonal polynomials. Historically, these polynomials were discovered 
as solutions to differential equations arising in various physical problems. 

Such polynomials can be produced by starting with l,x,x 2 , ... and employing 
the Gram-Schmidt process. However, there is a more elegant, albeit less general, 
approach that simultaneously studies most polynomials of interest to physicists. 
We will employ this approach. 1 

7.1 General Properties 

Most relevant properties of the polynomials of interest are contained in 
7.1.1* Theorem. Consider the functions 

Fn(.x) = — for n =0,1,2, .. t , (7.1) 

w{x) dx n 

where 

L F\ (x) is a first-degree polynomial in x, 

2. s(x) is a polynomial in x of degree less than or equal to 2 with only real 
roots ， 

3. w(x) is a strictly positive junction, integrable in the interval (a, b), that 
satisfies the boundary conditions w{a)s{d) = 0 = w(b)s(b). 


1 Tbiis approach is due to F. G. Tricomi [IYic 55]. See also [Denn 67], 


7.1 GENERAL PROPERTIES 173 


Then F n {x) is a polynomial of degree n in x and is orthogonal ― on the inter¬ 
val {a, b), with weight «;( 叉 ) 一 to any polynomial pk(x) of degree k < n, Le” 
pk(x)F n (x)w(x) dx = 0 for k < n. These polynomials are collectively called 

classical orthogonal polynomials. 

Before proving the theorem, we need two lemmas: 2 
7*1,2. Lemma« The following identity holds: 


d m 


(ws n p s k) — U)s n ~ m p<k-^m, m <n. 


dx m 

Proof. See Problem 7.1. 


□ 


7.1.3. Lemma. All the derivatives d m /dx m (ws n ) vanish at x = a and x = b,for 
all values ofm < n. 


Sot k - 0 in the identity of the previous lemma and let p<o = 1. Then we 

d m .一 

have - (ws n ) = ws n " m p <m . The RHS vanishes atx = a and x = b due to the 

dx m ~ 

third condition stated in the theorem. □ 


Proof of the theorem. .We prove the orthogonality first. The proof involves multi¬ 
ple use of integration by parts: 


Pk(x)F n (x)w(x) dx 


a 


、 b l 

Pk(x)— 

、 b d 

PkM 


- d n 
-dx n 
d n ~ l 


dx 


(ws n ) 
(ws n ) 


wdx 


dx n ~ l 
b 


dx 


cP 一 1 


>b dp k f 一 1 , … 


=Oby Lemma 7.1.3 


This shows that each integration by parts transfers one differentiation from ws n to 
Pk and introduces a minus sign. Thus, after k integrations by parts, we get 


)w;(a:) dx 


(- 1 )’ 


d k p k d n ~ k / 
ta )dx 


r» h 


0 , 


2 Recall that is a generic polynomial with degree less than or equal to k. 




174 7. CLASSICAL ORTHOGONAL POLYNOMIALS 


where we have used the fact that the Ath derivative of a polynomial of degree k 
is a constant. Note that n — k — i >0 because k < n, so that the last line of the 
equation is well-defined. 

To prove the first part of the theorem, we use Lemma 7.1.2 with k = 0 and m = 
jn 1 d n 

/I to get — (ws n ) = wp <n ,oxF n (x) - — —— (ws n ) - /?< n .To prove that 00 
dx n ~ w dx n ~ 

is a polynomial of degree precisely equal to n，we write F n (x) = p< n -i(x) -\-k n x , 

multiply both sides by u;(x)F n (^) } and integrate over (a f b)\ 



[F n {x)fw{x)dx 



p sn ^\F n {x)w{x) dx + k n 



x n F n {x)w{x)dx. 


The LHS is a positive quantity because both w(x) and [F n (x)] 2 are positive, and 
the first integral on the RHS vanishes by the first part of the proof. Therefore, the 
second term on the RHS cannot be zero. In particular, k n ^ 0, and F n (x) is of 
degreen. □ 


It is customary to introduce a normalization Constantin the definition of F„(x), 
and write 


F n (x)= 


d n 


K n w dx n 


(ws n ). 


(7.2) 


generalized This equation is called the generalized Rodriguez formula. For historical reasons, 
Rodriguez formula different polynomial functions are normalized differently, which is why Kn is 

introduced here. 

From Theorem 7.1.1 it is clear that the sequence Fo(x), F\(x), 巧 ⑺， ...of 
polynomials forms an orthogonal set of polynomiais on [a, b] with weight function 
w(x). 

All the varieties of classical orthogonal polynomials were discovered as solu¬ 
tions of differential equations. Here, we give a single generic differential equation 
satisfied by all the iVs. The proof is outlined in Problem 7.4. 


differential equation 
for classical 
orthogonal 
polynomials 


7.1.4. Proposition. Let k\ be the coefficient ofxinF]Xx) and 02 the coefficient of 
x 2 in s(x). Then the orthogonal polynomials F n satisfy the differential equation 3 

= wk n F n {x) where X n = K\k\n + — 1). 


We shall study the differential equation above in the context of the Sturm ， 
Liouville problem (see Chapters 18 and 19)，which is an eigenvalue problem in¬ 
volving differential operators. 


A prime is a symbol for derivative with respect to 


7.2 CLASSIFICATION 175 


7.2 Classification 


Let us now investigate the consequences of various choices of s(x). We start with 
FiW, and note that it satisfies Equation (7.2) with n — \\ 


Fi(x) 


1 d 
K\w dx 


(㈣ ， 


or 


J_ d 
ws dx 


(ws) 


KiFxix) 


which can be integrated to yield ws = A exp (/ K\F\(x)dx!s) where A is a 
constant. On the other hand, being a polynomial of degree 1， Fi (x) can be written 


as Fi(a:) = k\x + It follows that 

w{a)s{a) =： 0 = w(b)s(b). 



(7.3) 


Next we look at the three choices for 5(;c): a constant, a polynomial of degree 
1， and a polynomial of degree 2. For a constant s(x). Equation (7.3) can be easily 
integrated; 


u;(^)*y(jc) = A exp (/ + dx^j = A exp ^ J (lax + y3) dx) 

= Ae ax2 ^ x+C = 5 严 2+ 气 


The interval (a, b) is determined by w{a)s(a) = 0 = w(b)s(b )，which yields 
B e aa 2 +fla = q = 五 # 厶 2 +邱 The only way that this equality can hold is for a and 
& to be infinite. Since a < fo, we must take a = —oo and ft = +oo, in which case 
a < 0. With y = s/\a~\(x + 芦 /(2a)) and choosing B = s exp ( 卢 2 /(4a)), we obtain 
u;(y) = exp(-y 2 ). We also take the constant 沒 to be 1. This is always possible by 
a proper choice of constants such as B. 

If the degree of j is 1， then j(jc) = a\x + ao and 


it;(x)(crix + cro) — A ex 


p (/ 


K\{k\xk[) 

a\x + cro 


dx 


=Aexp [/ (^r 

= B{a\x + ao) p e y 


仙 + 邱 - Klkia ° /ai - ) dx 


a\x H- (7o 


X 


where y = K\k\j<y\ 9 p = K\k[/ai — Kihcro/af, and 5 is A modified by 
the constant of integration. The last equation above must satisfy the boundary 
conditions at a and b: B(cx\a + ao) p e ya = 0 = B(a\b + (To) p e yb , which give 
a = -cro/ai, p > 0, y < 0, and b — +oo. With appropriate redefinition of 
variables and parameters, we can write = y v e~ y 9 v > —1, and s(x) = x, 
a = 0,b = +oo. 




176 7. CLASSICAL ORTHOGONAL POLYNOMIALS 



Table 7.1 Special cases of Jacobi polynomials 


Similarly, we can obtain the weight fiinction and the interval of integration for 
the case when ^(^)isof degree 2. This result, as well as the results obtained above, 
are collected in the following proposition. 

7.2*1. Proposition. If the conditions of Theorem 7 ， L1 prevail, then 

2 B 

(a) For s(x) of degree zero we get w(^) = e~ x with j(^:) = 1, a = — oo, and 
b = +oa The resulting polynomials are called Hermite polynomials and 
are denoted by H n (x). 

(b) For 5( 文 ) of degree 1, we obtain w{x) == x v e~ x with v > —1, = x y 

a = 0, and b = +oo. The resulting polynomials are called Laguerre 
polynomials and are denoted by L^{x). 

(c) For s(x) of degree 2, we get iy(x) = (1 + x)^(l — x) v with /a, v > — 1, 
j(^:) = 1 — x 2 t a — — 1, and b = +1. The resulting polynomials are called 
Jacobi polynomials and are denoted by Pn' v {x). 

Jacobi polynomials are themselves divided into other subcategories depending 
on the values of and v. The most common and widely used of these are collected 
in Table 7.1. Note that the definition of each of the preceding polynomials involves a 
“standardization，” which boils down to a particular choice of K n in the generalized 
Rodriguez formula. 

7,3 Recurrence Relations 

Besides the recurrence relations obtained in Section 5.2, we can use the differen¬ 
tial equation of Proposition 7.1.4 to construct new recurrence relations involving 
derivatives. These relations apply only to classical orthogonal polynomials, and 
not to general ones. We start with Equation (5.12) 

F n -^x(x) - (a n x + ^ n )F n (x) + y n F n ^i(x), 


(7.4) 




7.3 RECURRENCE RELATIONS 177 


differentiate both sides twice, and substitute for the second derivative from the 
differential equation of Proposition 7.1.4. This will yield 

2wsa n F n + 

— wX n ^\F n ^i H- wy n X n -\F n ^\ — 0. 


d 

(X n — ( ws ) + + AO 

-ax 


F n 


(7.5) 


Karl Gustav Jacob Jacobi (1804-1851) was the second son 
bom to a well-to-do Jewish banking family in Potsdam. An 
obviously bright young man, Jacobi was soon moved to the 
highest class in spite of his youth and remained at the gym¬ 
nasium for four years only because he could not enter the 
university until he was sixteen. He excelled at the University 
of Berlin in all the classical subjects as well as mathematical 
studies, the topic he soon chose as his career. He passed the 
examination to become a secondary school teacher, then later 
the examination that allowed university teaching, and joined 
the faculty at Berlin at the age of twenty. Since promotion 

there appeared unlikely, he moved in 1826 to the University of Konigsberg in search of a 
more permanent position. He was known as a lively and creative lecturer who often injected 
his latest research topics into the lectures. He began what is now a common practice at 
most universities — the research seminar — for the most advanced students and his faculty 
collaborators. The Jacobi “school,” together with the influence of Bessel and Neumann (also 
at Konigsberg), sparked a renewal of mathematical excellence in Germany. 

In 1843 Jacobi fell gravely ill with diabetes. After seeing his condition, Dirlchlet, with 
the help of von Humboldt, secured a donation to enable Jacobi to spend several months in 
Italy, a therapy recommended by his doctor. The friendly atmosphere and healthful climate 
there soon improved his condition. Jacobi was later given royal permission to move from 
Konigsberg to Berlin so that his health would not be affected by the harsh winters in the 
former location. A salaiy bonus given to Jacobi to offset the higher cost of living in the capital 
was revoked after he made some politically sensitive remarks in an impromptu speech. A 
permanent position at Berlin was also refused, and the reduced salary and lack of security 
caused considerable hardship for Jacobi and his family. Only after he accepted a position in 
Vienna did the Prussian government recognize the desirability of keeping the distinguished 
mathematician within its borders, offering him special concessions that together with his 
love for his homeland convinced Jacobi to stay. In 1851 Jacobi died after contracting both 
influenza and smallpox. 

Jacobi’s mathematical reputation began largely with his heated competition with Abel in 
the study of elliptic functions. Legendre, formerly the star of such studies, wrote Jacobi of his 
happiness at having “lived long enough to witness these magnanimous contests between two 
young athletes equally strong.” Although Jacobi and Abel could reasonably be considered 
contemporary researchers who arrived at many of the same results independently, Jacobi 
suggested the names “Abelian functions” and “Abelian theorem” in a review he wrote for 
Crelle’s Journal. Jacobi also extended his discoveries in elliptic functions to number theory 
and the theory of integration. He also worked in other areas of number theory, such as the 
theory of quadratic forms and the representation of integers as sums of squares and cubes. He 






178 1. CLASSICAL ORTHOGONAL POLYNOMIALS 



presented the well-known Jacobian, or functional determinant, in 1841. To physicists, Jacobi 
is probably best known for his work in dynamics with the form introduced by Hamilton. 
Although elegant and quite general, Hamiltonian dynamics did not lend itself to easy solution 
of many practical problems in mechanics. In the spirit of Lagrange, Poisson, and others, 
Jacobi investigated txansfonnations of Hamilton^ equations that preserved their canonical 
nature (loosely speaking, that preserved the Poisson brackets in each representation). After 
much work and a little simplification, the resulting equations of motion, now known as 
Hamilton-Jacobi equations ，allowed Jacobi to solve several important problems in ordinary 
and celestial mechanics. Clebsch and later Helmholtz amplified their use in other areas of 
physics. 


We can get another recurrence relation involving derivatives by substituting 
(7.4) in (7.5) and simplifying: 

d_ 
dx 

+ wy n (X n ^\ - A n+ i)F„_i = 0. 


2wsa n F r n + \a n 


(ujs) + w(X n - F n 


(7.6) 


Two other recurrence relations can be obtained by differentiating Equations 
(7.6) and (7.5), respectively, and using the differential equation for F n . Now solve 
the first equation so obtained for (d !dx){wF n -\) and substitute the result in the 
second equation. After simplification, the result will be 


2wa n X n F n + ^-\ \a n ^-(ws) + w(k n - X n -i)(a n xp n )] F n 
dx i I dx J 


d 

+ (^_ 1 -A w+ i)-(^F n+1 ) 




0. 


Finally, we record one more useful recurrence relation: 

dw dw 

A n (x)F n — k n ^.\ (a n X + 馬 + ?nA.n-l(a n X + Pn)-^^n-\ 

+ B n {x)F n+l + YnD n {x)F n ^ = 0 , 


(7.7) 


(7.8) 


where 


cfi dw 

2wa n X n + a n (ws) + k n (a n x + ^ n ) 


dx 2 


A n (x) = (a n x + p n ) 

-j 去 ㈣ ， 

d 

D n (x) = w(a n x + - K) - Ol n — (ws). 


dx 


Details of the derivation of this relation are left for the reader. All these recurrence 
relations seem to be very complicated. However, complexity is the price we pay 



7.4 EXAMPLES OF CLASSICAL ORTHOGONAL POLYNOMIALS 179 


for generality. When we work with specific orthogonal polynomials, the equa- 
lisetul recurrence tions simplify considerably. For instance, for Hermite and Legendre polynomials 

relations for Hermite Equation (7.6) yields, respectively, 
and Legendre 

polynomials = 2nH n -i, and (1 一 x 2 )P n + nxP n _ nP n ^ = 0. (7.9) 

Also, applying Equation (7.7) to Legendre polynomials gives 

+ （7- 10 ) 

and Equation (7.8) yields 

_ '-1 - (加 + = 0. (7.11) 

It is possible to find many more recurrence relations by manipulating the ex¬ 
isting recurrence relations. 

Before studying specific orthogonal polynomials, let us pause for a moment 
to appreciate the generality and elegance of the foregoing discussion. With a few 
assumptions and a single defining equation we have severely restricted the choice 
of the weight function and with it the choice of the interval {a, b). We have nev¬ 
ertheless exhausted the list of the so-called classical orthogonal polynomials. 


7.4 Examples of Classical Orthogonal Polynomials 

We now construct the specific polynomials used frequently in physics. We have 
seen that the four parameters^, k n , k f n , smdh n determine all the properties of the 
polynomials. Once K n is fixed by some standardization, we can determine all the 
other parameters: k n and k f n will be given by the generalized Rodriguez formula, 
and h n can be calculated as 


hfi = / (x) w (x) dx 


•b 


f b 

k n I wx 1 
Ja 


1 


(k n x n + • •. )F n (x)w(x)dx 

*b j r ^n-l 


dx n - 


kn 


d n ~ l 

K n xH d^ iWSn) 


d n / k n f n d 

K^d^ (ws)dx ^T n J a x T x 


『 (W 1 ) 


dx 


K n 


dx dx n . 


The first term of the last line is zero by Lemma 7.1.3. It is clear that each integra¬ 
tion by parts introduces a minus sign and shifts one differentiation from ws n to 
Thus, after n integrations by parts and noting that d°/dx°(ws n ) = ws 11 and 


d n /dx n (x n ) = n\, we obtain 
(-l) n k n n\ ^ 




K n 


ws n dx. 


(7.12) 


IflO 7. CLASSICAL ORTHOGONAL POLYNOMIALS 


summary of 
properties of Hermite 
polynomials 


summary of 
properties of 
Laguerre 
polynomials 


7,4.1 Hermite Polynomials 

The Hermite polynomials are standardized such that K n = (-l) n . Thus, the 
generalized Rodriguez formula (7.2) and Proposition 7.2.1 give 




(7.13) 


a 2 

It is clear that each time e~ x is differentiated, a factor of —2x is introduced. 
The highest power af x is obtained when we differentiate e~ x n times. This yields 
(-l) n e x2 (-2x) n e~ x2 = 2 n x n k n = 2 n . 

To obtain k f n , we find it helpM to see whether the polynomial is even or odd. 
We substitute -x for x in Equation (7.13) and get H n {-x) = (-l) n H n (x), which 
shows that if «is even (odd), H n is an even (odd) polynomial, i_e” it can have only 
even (odd) powers of x. In either case, the next-highest power of 戈 in H n (x) is 
not« — 1 but n — 2. Thus, the coefficient of x n ~ l is zero for H n (x), and we have 
k r n = 0. For h n , we use (7.12) to obtain = ^/7u2 n n\. 

Next we calculate the recurrence relation of Equation (5.12). We can readily 
calculate the constants needed: oc n = 2, = 0, y n = —2«. Then substitute these 

in Equation (5.12) to obtain 

迅 +1W = 2xH n (x) - 2nH n ^\(x). (7.14) 

Other recurrence relations can be obtained similarly. 

Finally, the differential equation of H n (x) is obtained by first noting that K\ = 
— 1 ，巧 = 0, =2x k\ = 2. All of this gives X n = —2n ， which can be 

used in the equation of Proposition 7.1.4 to get 


d 2 H n 

dx 2 


— 2x 


dH n 


dx 


+ 2nH n 



(7.15) 


7.4.2 Laguerre Polynomials 

For Laguerre polynomials, the standardization is K n 
Rodriguez formula (7.2) and Proposition 7.2.1 give 


n\. Thus, the generalized 


KM 


d n 


n\x v e^ x dx n 


(x v e~ x x n ) 


n\ 


x e 


d n 

dx^ 


(x n+v e~ x ). 


(7.16) 


To find k n we note that differentiating e~ x does not introduce any new powers 
of x but only a factor of —1. Thus, the highest power of x is obtained by leaving 


x n+v alone and differentiating e~ x n times. This gives 


-x^e x x n ^ v (-l) n e- x = 
n\ 


(—ir 

n\ 


x n k n = 


(一 l) n 
n! 


7A EXAMPLES OF CLASSICAL ORTHOGONAL POLYNOMIALS 181 


the gamma function 


We may try to check the evenness or oddness of V n (x); however, this will not 
be helpful because changing x to -x distorts the RHS of Equation (7.16). In fact, 
私 / 0 in this case, and it can be calculated by noticing that the next-highest power 
of x is obtained by adding the first derivative of x n ^ v n times and multiplying the 
result by (—l) n_1 , which comes from differentiating e~ x . We obtain 


n\ 


x~ v e x [(-l) n - l n(n^ v)x n ^ v ~ 1 e- x ] 


( _l)»-l (w + y) n-1 


(n - 1)1 


and therefore k f n = ( 一 l) n_1 (n + v)/(n — 1)!. 
Finally, for h n we get 

f nl Jo 


►OO 


x n+v e~ x dx, 


nl Jo 


If v is not an integer (and it need not be), the integral on the RHS cannot be 
evaluated by elementary methods. In fact, this integral occurs so frequently in 
mathematical applications that it is given a special name, the gamma function. 
A detailed discussion of this function can be found in Chapter 11. At this point, 
we simply note that 


>00 


r(z +1) 


x z e~ x dx, r(w + 1) = k! for n gN, 


(7.17) 


^0 


and write h n as 

, r(n + v + l) r(n + v + l) 

kn = n\ = T(n + 1) • 

The relevant parameters for the recurrence relation can be easily calculated: 
1 • 2n + v 4-1 n + v 


<x n 


Pn 


Yn 


n + V 严 "— n + l 1 n + 1 

Substituting these in Equation (5.12) and simplifying yields 

{n + 1)L^ +1 = (2n + v + 1 — x)L^ — (w + v)L v n _^. 

With k\ — -1 and 0*2 = 0, we get X n — and the differential equation of 
Proposition 7.1.4 becomes 


x 


u JjV 


dx 2 


dx 


(7.18) 


182 7. CLASSICAL ORTHOGONAL POLYNOMIALS 


summary of 
properties of 
Legendre 
polynomials 


7,4.3 Legendre Polynomials 

Instead of discussing the Jacobi polynomials as a whole，we will discuss a special 
case of them, the Legendre polynomials P n ix)^ which are more widely used in 
physics. 

With 弘 = 0 = v，corresponding to the Legendre polynomials, the weight 
function for the Jacobi polynomials reduces to w{x) = \, The standardization is 
K n = (—iy i 2 n n\. Thus, the generalized Rodriguez formula reads 


Pn(x)= 


(-l) n d n 
2 n n\ dx n 


t(l~ 々 ]• 


(7.19) 


To find 4 ， we expand the expression in square brackets using the binomial theorem 
and take the nth derivative of the highest power of x. This yields 


k n x n 


[ ~ ir ^ K - 々 ] 


d n 


2 n n\ dx n 


2 n n\ dx 11 


(x 2n ) 


2 n n\ 


2n (2n — 1) (2w 一 2) ， . •+ l)x 打 


After some algebra (see Problem 7.7), we get k n = 


2 n r(n + \) 
mr^) 


■^0 





Adrien-Marie Legendre (1752-1833) came from a well- 
to-do Parisian family and received an excellent education in 
science and mathematics. His university work was advanced 
enough that his mentor used many of Legendre’s essays in 
a treatise on mechanics. A man of modest fortune until the 
revolution, Legendre was able to devote himself to study 
and research without recourse to an academic position. In 
1782 he won the prize of the Berlin Academy for calculat¬ 
ing tlie trajectories of cannonballs taking air resistance into 
account. This essay brought him to the attention of Lagrange 
and helped pave the way to acceptance in French scientific 丫 m 
circles，notably the Academy of Sciences, to which Legendre submitted numerous papers. 
In July 1784 he submitted 汪 paper on planetary orbits that contained the now-famous Leg- 
endre polynomials' mentioning that Lagrange had been able to “present a more complete 
theory” in a recent paper by using Legendre’s results. In the years that followed, Legendre 
concentrated his efforts in number theory, celestial mechanics, and the theory of elliptic 
functions. In addition, he was a prolific calculator, producing large tables of the values of 
special functions, and he also authored an elementary textbook that remained in use for 
many decades. In 1824 Legendre refused to vote for the government’s candidate forlnstitut 
National. Because of this, his pension was stopped and he died in poverty and in pain at the 
age of 80 after several years of failing health. 

Legendre produced a large number of useful ideas but did not always develop them 
in the most rigorous manner, claiming to hold the priority for an idea if he had presented 




7.4 EXAMPLES OF CLASSICAL ORTHOGONAL POLYNOMIALS 183 


merely a reasonable argument for it. Gauss, with whom he had several quarrels over priority, 
considered rigorous proof the standard of ownership. To Legendre’s credit, however, he 
was an enthusiastic supporter of his young rivals Abel and Jacobi and gave their work 
considerable attention in his writings. Especially in the theory of elliptic functions, the area 
of competition with Abel and Jacobi, Legendre is considered more of a trailblazer than a 
great builder. Hermite wrote that Legendre “is considered the founder of the theory of elliptic 
functions” and “greatly smoothed the way for his successors,” but notes that the recognition 
of the double periodicity of the inverse function, which allowed the great progress of others, 
was missing from Legendre’s work. 

Legendre also contributed to practical efforts in science and mathematics. He and two 
of his contemporaries were assigned in 1787 to a panel conducting geodetic work in co¬ 
operation with the observatories at Paris and Greenwich. Four years later the same panel 
members were appointed as the Academy’s commissioners to undertake the measurements 
and calculations necessary to determine the length of the standard meter. Legendre’s seem¬ 
ingly tireless skill at calculating produced large tables of the values of trigonometric and 
elliptic functions, logarithms, and solutions to various special equations. 

In his famous textbook Elements de geomStrie (1794) he gave a simple proof that tt is 
irrational and conjectured that it is not the root of any algebraic equation of finite degree with 
rational coefficients. The textbook was somewhat dogmatic in its presentation of ordinary 
Euclidean thought and indudes none of the non-Euclidean ideas beginning to be formed 
around tbat time. It was Legendre who first gave a rigorous proof of the theorem (assuming 
all of Euclid's postulates, of course) that the sum of the angles of a triangle is “equal to 
two right angles.” Very little of his research in this area was of memorable quality. The 
same could possibly be argued for the balance of his writing, but one must acknowledge 
the very fruitful ideas he left behind in number theory and elliptic fiinctions and，of course, 
the introduction of Legendre polynomials and the knpoTisait Legendre transformation used 
both in thermodynamics and Hamiltonian mechanics. 



To find we look at the evenness or oddness of the polynomials. By an 
investigation of the Rodriguez formula — as in our study of Hermite polynomials — 
we note that P n (-x) = (-l) n P n (x), which tells us that P n (x) is either even or 
odd. In either case, x will not have an (n — l)st power. Therefore, k f n = 0. 

We now calculate h n as given by (7.12): 






The integral can be evaluated by repeated integration by parts (see Problem 7.8). 
Substituting the result in the expression above yields h n = 2 / (2« + 1). 

We need a n , p n and y n for the recurrence relation: 

_ k n+l _ 2 n+1 r(n + l + |) n!r ( 臺 ） = 2n + 1 
an = lc^ = (w + l)!r (幻 2 n r(n + \) — » + 1 ， 

where we used the relation r(n +1 + 士 ） = (n + -)r (rt + 去 ) • We also have = 0 
(because k， n = 0 = andy„ = -n/(n + l). Therefore, the recurrence relation 


184 7. CLASSICAL ORTHOGONAL POLYNOMIALS 


is 

(n + 1)/^+10) = (2tt + l)xP n (x) -nP n -i(x). (7.20) 

Now we use K\ — —2, Pi(x) = x k\ = 1, and (J 2 = —1 to obtain 
k n = —n(n + 1), which yields the following differential equation: 

icl (1 - x2) ^t]=~ n{n+1)Pn - (7 _ 21) 

This can also be expressed as . 

(1 — x2 )^T -2x^+/z(« + l)P n = 0. (7.22) 


7AA Other Classical Orthogonal Polynomials 

The rest of the classical orthogonal polynomials can be constructed similarly. For 
the sake of completeness, we merely quote the results. 

Jacobi Polynomials, W(x) 

Standardization: K n = (—2) n nl 


Constants: k n = 2 


r (2n + M + v + 1) 
n\T(n + g + 1 ； + 1)’ 


K 


n(v — fi) 
2n + /a + v 


kn ， 




2 ^+v+i r(n + M + i)r(» + u + i) 
n!(2n + /x + y + l)F(w + 〆 + v + 1) 


Rodriguez formula: 


(-\) n d n 

O ) = ^-0 + ^-^(l - xy v — [(i + x r +n (i - x y^ n ] 


Differential Equation: 


(1 - x 2 ) 


d 2 P^ v 

dx 2 


+ [/^_v_(/x + v+ 2)x] 


dP^ 




dx 


+ n0z + /i +v + l)i^， u =0 


A Recurrence Relation: 

2(n + l)(n + 从 + v + l)(2n + /x + v)P^_\ 

= (2w + 烊 + v + l)[(2n + 从 + v)(2rt + ^ + v + 2)jc + v 2 — fi 2 ]P^ v 
一 2(n + v)(2n + m + v + 2)P^\ 


.4 EXAMPLES OF CLASSICAL ORTHOGONAL POLYNOMIALS 185 


Gegenbauer Polynomials, C^(x) 


Standardization: K n — {—2) n n\ 


r(?2 + x + 士 ) r(2 入 ) 

r(n + 2A)r(x + 去 ) 


, 2 n T(n + X) , y^rr(n + 2X)r(X+^) 

Constants: k n = — r( 入厂， K=^ « = n \( n x)T(2X)r(k) 


Rodriguez Formula: 


!)n+ 入 - 1/2] 


Differential Equation: 


r1C k , 

(1 - x 2 )^--\2X + l)^^+«(« + 2 入 


A Recurrence Relation: 


(n 4 - l)C》+i = 2(n + - (n + 2 入 - l)C^_i 

Chebyshev Polynomials of the First Kind ， T n (x) 


Standardization: K n = (—l) rt 


(2n)\ 

2 n n\ 


Constants: k n = 2 n ~ l , k f n — 0, h n — ~ 

Rodriguez Formula: T n {x) = ■ ( ^w)! ^ ^ 一戈 2 ) 1 "^^ [(工 _ ^ 2 ) n ~ 1 ^ 2 . 


Differential Equation: 


(1 - - = 0 


A Recurrence Relation: T n ^\ — 2xT n — T n ~i 

Chebyshev Polynomials of the Second Kind ， U n (x) 


Standardization: K n = (一 l) w 


(2n + 1)! 
2 n (n + l)l 


Constants: k n = 2 n , k r n = 0, h n 


186 7. CLASSICAL ORTHOGONAL POLYNOMIALS 


Rodriguez Formula: 


(-l) n 2 n (n + 1)! 


d n 


UnM = ' 二…二 ” (i -x 2 r 1/2 ^[d -x 2 r^ 2 ] 


(2n + 1)! 


dx n 


Differential Equation: (1 — x 2 )^ ^ — - +«(n + 2)U n = 0 

ax 1 ax 

A Recurrence Relation: U n -^\ — 2xU n - U n -\ 


7.5 Expansion in Terms of Orthogonal Polynomials 

Having studied the different classical orthogonal polynomials, we can now use 
them to write an arbitrary function / e L^a.b) as a series of these polynomials. 
If we denote a complete set of orthogonal (not necessarily classical) polynomials 
by \Ck) and the given function by |/>, we may write 


00 


I/) 


k=Q 


(7.23) 


where is found by multiplying both sides of the equation by (Q | and using the 
orthogonality of the |C^>’s: 


OO 


{Ci\f) = ^2a k {C i \C k )=a i {C i \C i ) 

k=0 

This is written in function form as 

f^q(x)f(x)w(x)dx 

- - - 

\Q(x)\ 2 w(x)dx 


ai 


(Cj\f) 

(Ci\Q) 


(7.24) 


(7.25) 


We can also “derive” the functional form of Equation (7.23) by multiplying both 
of its sides by {x\ and using the fact that (x|/> = f{x) and (x\Ck) = Ck{x). 
The result will be 


OO 


f (^) = 〉： 

jt=0 


(7.26) 


7.5_1. Example. The solution of Laplace’s equation in spherically symmetric electro¬ 
static problems that are independent of the azimuthal angle is given by 


OO 1 

°( r » = (^tTl + c k rk ) ^(cos 61 ). 


(7.27) 


Consider two conducting hemispheres of radius a separated by a small insulating gap 
at the equator. The upper hemisphere is held at potential Vq and the lower one at - Vj)， as 



7.5 EXPANSION IN TERMS OF ORTHOGONAL POLYNOMIALS 187 


shown in Figure 7.1. We want to find the potential at points outside the resulting sphere. 
Since the potential must vanish at infinity, we expect the second term in Equation (7.27) to 
be absent, i.e., = 0 V it. To find 〜， substitute a for r in (7.27) and let cosQ = x. Then, 

OO h 

0(a ， _r) = ih Pk ( x )， 


k=0 


W+l 


where 


l+Vj) if 0 < jc < 1 . 


From Equation (7.25)，we have 
b k Pk(x)^(a,x)dx 


Ml 


\P k (x)\ 2 dx 


2 .k -\- 


Pf c (x)^(a i x)dx 


-fik 


2k1 
2 一 


V 0 


,0 rl 

P k (x)dx+ I PkM dx 

-1 Jo 


To proceed, we rewrite the first integral: 

f° Pk(x)dx = 一广 Pk(-y)dy= f 1 Pk(-y)dy = (- 1 )* f PkMdx, 

J-l J-Hl JO JO 

where we made use of the parity property of PkiX)- Therefore, 

= Pk(x)dx 

It is now clear that only odd polynomials contribute to the expansion. Using the result of 
Problem 7.26, we get 

bk 


a 




or 


: ( 如 + 3 )， 2 ， 〜咖 + 1) 「 

Note that 0(a, x) is an odd function; that is, -x) = —0(a ， x) as is evident from its 
definition. Thus, only odd polynomials appear in the expansion of 0(a, x) to preserve this 
property. Having found the coefficients, we can write the potential: 


(2m)! 


oo 


„ m (4m + 3)(2m)! /a\2w+2 

吵肩 = (-妒 22m+lm!(m + 汉 ( ； ) 作 -)• 


m 


188 7. CLASSICAL ORTHOGONAL POLYNOMIALS 


Figure 7.1 The voltage is +Vq for the upper hemisphere, where 0 <0 < jr/2, or where 
0 < cos^ < 1. It is —Vo for the lower hemisphere, where tt/ 2 < 0 < n, or where 
~1 < cos 0 < 0. 



The place where Legendre polynomials appear most naturally is, as mentioned 
above, in the solution of Laplace’s equation in spherical coordinates. After the par¬ 
tial differential equation is transformed into three ordinary differential equations 
using the method of the separation of variables, the differential equation corre¬ 
sponding to the polar angle 0 gives rise to solutions of which Legendre polynomi¬ 
als are special cases. This differential equation simplifies to Legendre differential 
equation if the substitution = cos^ is made; in that case, the solutions will 
be Legendre polynomials in x, or in cos That is why the argument of Pk(x) is 
restricted to the interval [—1 ， +1]. 

7.5.2. Example. We can expand the Dirac delta function in terms of Legendre polyno¬ 
mials. We write 


co 


a n p n(x), where a n 


2n + 
2 


^n(x)5(^：) dx 


2n + 


Pn(0\ 


For odd n this will give zero, because P n (x) is an odd polynomial. To evaluate P n (0) for 
even n, we use the recurrence relation (7.20) for jc = 0: 

(a + ⑼， 


— 1 

ornP tt (0) = —(« — 1)P fi_ 2(0), or P n (0) = - 4 一 2(0). Iterating this m times, we 

obtain n 


Pn(P) = ( 一 1 严 


(n — l)(n — 3) ■••(« — 2m + 1) 
n(n- 2)(n 一 4) • • • (u — 2m + 2) 匕-加 ⑼ 


For « = 2m, this yields P 2m (0) = (-l) m 、 2m 3) ~ 1 P Q (0). Now we “fill 

Zfn \Zm 一 Z} • • • 4 • 2 

the gaps” in the numerator by multiplying it 一 and the denominator, of course — by the 



7.6 GENERATING FUNCTIONS 1 的 


generating function 


denominator. This yields 

m 2m(2m — l)(2m — 2) … 3 • 2 ■ 1 


尸 2m ⑼ = (-l) 

= (-D 


[2m(2m-2)_..4.2] 2 
m (2m)! , _ (2m)! 


(一 iy 


[2 m m\] 2 一 、 ~ 2 2/ »(m!) 2? 
because PqW = 1. Thus, we can write 
00 • ， (2m)! 


你） 


(- 1 


P2m( x )' 


We can also derive this expansion as follows. For any complete set of orthonomial 
vectors (lA)}^, we have 

Six — x*) = «;(x) {jc|x ; ) — w(x) (x|11/) 

=wW (x\ |/> = ^(x)J2fk(x f )fk(x). 

Legendre polynomials are not orthonormal; but we can make them so by dividing Pk(x) by 
= >/2/(2fc + 1). Then, noting that w{x) ― 1, we obtain 

Pk 的 


oo 


8(x-x f ) = 


PkM 


to y/2 / {2k + 1) ^2/(2k + 1) 岂 2 


00 9t 4 - 1 

E — ^ — PkW)P k (x). 


For x' = 0 we get S{x、= Y^q 
result. 


2k +1 
2 


Pki^PjcOO, which agrees with the previous 

讎 


7.6 Generating Functions 

It is possible to generate all orthogonal polynomials of a certain kind from a single 
function of two variables g(x,t) by repeated differentiation of that function. Such 
a function is called a generating function. This generating function is assumed 
to be expandable in the form 

00 

g{x, t) =^^a n t n F n {x), (7.28) 

n=0 

so that the nth derivative of g(x,t) with respect to t evaluated at f = 0 gives F n {x) 
to within a multiplicative constant. The constant a n is introduced for convenience. 
Clearly, for g(x, 0 to be useful, it must be in closed form. The derivation of such 
a function for general F n (x) is nontrivial, and we shall not attempt to derive such 
a general generating function—as we did, for instance, for the general Rodriguez 
formula. Instead, we simply quote these functions in Table 7.2, and leave the 
derivation of the generating functions of Hermite and Legendre polynomials as 
Problems 7.14 and 7.20. For the derivation of Laguerre generating function, see 
[Hassani, 2000] pp. 606-607. 


190 7. CLASSICAL ORTHOGONAL POLYNOMIALS 


Polynomial 

Generating function 

% 

Hermite, H n (x) 

exp ㈠ 2 + 2x0 

1/nl 

Laguerre, L^(x) 

cxp[-xt/(l - - f) v+1 

1 

Legendre, P n (x) 

(t 2 - 2xt + I)- 1 〆 2 

1 

Chebyshev (1stkind), T n (x) 

(1 - t 2 ){t 2 - 2xt + l)， 1 

2,n 户 0 



邱 = 1 

Chebyshev (2ndkind), U n (x) 

(t 2 - 2xtl)~ l 

1 


Table 7.2 Generating functions for selected polynomials 


7.7 Problems 


7 . 1 . Let n = 1 in Equation (7.1) and solve for s ~^~^ Now substitute this in the 

derivative of ws n p<^ and show that the derivative is equal to ws n ~ x Repeat 
this process m times to prove Lemma 7.1.2. 

7 * 2 . Find w(x), a, and b for the case of the classical orthogonal polynomials in 
which s(x) is of second degree. 

7 . 3 . Integrate by parts twice and use Lemma 7.1.2 to show that 
f F m (wsF f n ) f dx — 0 for m <n. 

7.4 ■⑻ Using Lemma 7.1.2 conclude that {wsF^y /w is a polynomial of degree 
less than or equal to n. 

(b) Write (wsF^Y/wsisa. linear combination of Fi (x), and use their orthogonality 
and Problem 7.3 to show that the linear combination collapses to a single term. 
⑹ Multiply both sides of the differential equation so obtained byand integrate. 
The RHS becomes “• For the LHS, carry out the differentiation and note that 
(wsY/w — K\F\, Now show that + l yi^ / isa polynomial of degreen, and 

that the LHS of the differential equation yields {(K\kin + U 2 n{n — \)}h n . Now 
find 入 

7_5* Derive the recurrence relation of Equation (7.8). Hint: Differentiate Equation 
(7.5) and substitute for from the differential equation. Now multiply the result- 

ing equation by a n x + and substitute for (a n x + 芦 n )F: from one of the earlier 
recurrence relations. 



7.7 PROBLEMS 191 


7.6. Using only the orthogonality of Hermite polynomials 



e~ x2 H m (x)H n (x)dx = \/tc 2 n nlS mn 


(and the fact that they are polynomials) generate the first three of them. 

7.7. Show that for Legendre polynomials, k n = 2 n T(n + Hint: 

Multiply and divide the expression given in the book by nl; take a factor of 2 out 
of all terms in the numerator; the even terms yield a factor of n!, and the odd terms 
give a gamma function. 

7.8, Using integration by parts several times, show that 

/:( w n 


Now show that f\(l - x 2 ) n dx = 2r(^)n!/[(2n + l)r(n + i)]. 

7.9. Use the generalized Rodriguez formula for Hermite polynomials and integra¬ 
tion by parts to expand x 2k and x 2k+1 in terms of Hermite polynomials. 

7.10. Use the recurrence relation for Hermite polynomials to show that 



xe~ x2 H m (x)H n (x) dx = ^2 n ^ l nl + 2(n + l)5 m ，„ +1 ]. 


What happens when m—nl 

7.11. Apply the general formalism of the recurrence relations given in the book 
to Hermite polynomials to find the following: 

Jin + ^C-1 - ^xH n ^\ = 0. 


7.12. Show that x 2 e~ x2 H^(x) dx = 2 n (n + j)n\ 

7.13. Use a recurrence relations for Hermite polynomials to show that 

0 if n is odd, 

丑“❶) =‘ 

ifn = 2m. 

7*14. Differentiate the expansion of g(^: ? 0 for Hermite polynomials with respect 
to x (treating f as a constant) and choose a n such that na n = a n -\ to obtain 
a differential equation for g. Solve this differential equation. To determine the 
“constant” of integration use the result of Problem 7.13 to show that g (0,0 = • 



192 7. CLASSICAL ORTHOGONAL POLYNOMIALS 


7,15, Use the expansion of the generating function for Hermite polynomials to 
obtain 

f ； e~ x2 H m (x)H n (x) S ^ = 

m,n=0 . 

Then integrate both sides over x and use the orthogonality of the Hermite polyno¬ 
mials to get 


Deduce from this the normalization constant h n of H n (x). 

7_16. Using the recurrence relation of Equation (7.14) repeatedly, show that 


f x k e~ x2 H m (x)H m+n (x)dx = 1°^ m ^ > ^ 

J-oo [y/7T 2 m (m + A：)! if n =k. 

7.17. Given that Pq(x) = 1 and Pj(jc) = x, 

⑻ use (7.20) repeatedly to show that /^(l) = 1. 

(b) Using the same equation, find P 2 (x), P 3 (x), and 户 4 ( 文 ). 

7.18* Apply the general formalism of the recurrence relations given in the book 
to find the following two relations for Legendre polynomials: 

⑻咕- + 戶二 =0. 

(b) (1 — x 2 )P r n - nP n ^\ + nxP n = 0. 

7.19. Show that x n P n (x) dx = 2 n+1 (n\) 2 /(2n + 1)!. Hint: Use the definition 
ofh n and k n and the fact that P n is orthogonal to any polynomial of degree lower 
thann. 

7.20. Differentiate the expansion ofg(jc, t) for Legendre polynomials, and choose 
a n = 1. For you will substitute two different expressions to get two equations. 
First use Equation (7.11) with n + 1 replaced by n, to obtain 

(1 — ^ = 2^^ nt n P n ^\ + 2t, 


As an alternative, use Equation (7.10) to substitute for and get 


(1 


- 的？ 

dx 


00 


Y^nt n P n ^i + 

n==2 


t. 


Combine the last two equations to get (t 2 -2xt^\)g f — tg. Solve this differential 
equation and determine the constant of integration by evaluating 犮 (x ， 0). 



7.7 PROBLEMS 193 


7.21. Use the generating function for Legendre polynomials to show that P«(l)= 
1 ， P„(—1) = (—l) n , J°„(0) = 0 for odd n, and P n (X) = n(n + 1)/2. 

7.22. Both electrostatic and gravitational potential energies depend on the quantity 
l/|r — r’|, where is the position of the source (charge or mass) and r is the 
observation point. 

(a) Let r lie along the ^-axis, and use spherical coordinates and the definition of 
generating functions to show that 



where r< (r>) is the smaller (larger) of r and 〆， and 0 is the polar angle. 

(b) The electrostatic or gravitational potential energy 0(r) is given by 

<b(r)=kfff-0^d 3 x , , 

where ^ is a constant and p(〆）is the (charge or mass) density function. Use the 
result of part (a) to show that if the density depends only on r\ and not on any 
angle (i.e., p is spherically symmetric), then 少 (r) reduces to the potential energy 
of a point charge at the origin for r > r f . 

(c) What is 少 (r) — in the form of an integral — for r < a for a spherically sym¬ 
metric density that extends from origin to a? 

(d) Show that E (or g) is given by [kQ(r)/r 2 ]e r where Q(r) is the charge (or 
mass) enclosed in a sphere of radius r. 

7*23. Use the generating function for Legendre polynomials and their orthogonal¬ 
ity to derive the relation 

/-:T^ 貪 /_>)-. 


Integrate the LHS, expand the result in powers of t, and compare these powers on 
both sides to obtain the normalization constant h n . 


7.24. Evaluate the following integrals using the expansion of the generating func¬ 
tion for Legendre polynomials. 


(a) 



(a cos 0 -\-b) dO 
Va 2 + labcosO + b 2 





(a cos 2 B -\-b sin 2 沒 ） sin 沒洲 
■Va 2 + 2ab cos ^ + b 2 


7.25. Differentiate the expansion of the Legendre polynomial generating function 
with respect to x and manipulate the resulting expression to obtain 


OO 00 

(l-2xt + t 2 )J2 t n PnM = tJ^t n P n (x). 

«=0 n =0 




194 7. CLASSICAL ORTHOGONAL POLYNOMIALS 


Equate equal powers of t on both sides to derive the recurrence relation 
户 n+l + 尸 — ~ = 0- 

7.26. Show that 


parity relations 


Pk(x)dx 


4o 


if ^ is even, 


(A?)%: 1 ) 1 if A ： is odd. 


Hint: For even k, extend the region of integration to (—1, 1) and use the orthogo¬ 
nality property. For odd 众 ， note that 


d k ~ l 


dx k ~ 


i (1 —x 2 ) 


gives zero for the upper limit (by Lemma 7.1.3). For the lower limit, expand the 
expression using the binomial theorem, and carry out the differentiation, keeping 
in mind that only one term of the expansion contributes. 

7.27. Show that t) = g(—x ，for both Hermite and Legendre polynomials. 
Now expand g(x, t) and g(—x 9 —t) and compare the coefficients of t n to obtain 
the parity relations for these polynomials: 

p 

Hn(-x) = {-l) n H n (x) and P n (-x) ^ {- \) n P n {x). 

7.28. Derive the orthogonality of Legendre polynomials directly from the differ¬ 
ential equation they satisfy. 


7.29. Expand]^ I in the interval (— i ， +1) in terms of Legendre polynomials. Hint: 
Use the result of Problem 7.26. 

7.30. Apply the general formalism of the recurrence relations given in the book 
to find the following two relations for Laguerre polynomials: 


Ajy 

¥) nV n - (n + v)L v n _ x - x-^- = 0. 

(b) {n + — (2w + v + 1 — x )^n + (w + v)L v n _^ = 0. 

7 . 31 . From the generating function for Laguerre polynomials given in Table 7.2 
deduce that 4 ⑼ =r(n + v + l)/[w!r(v + 1)]. 

7 . 32 . Let L n = Now differentiate both sides of 

g(x,t) = - - = 〉 ] ^ n ^n(, x ) 

丄 - f 0 

with respect to x and compare powers of t to obtain L 二 (0) = —n and jL:;(0)= 
^n(n — 1). Hint: Differentiate 1/(1 — /) = tU tD S et m expression for 

ix-tr 2 . 



7.7 PROBLEMS 195 


7.33* Expand e - ^ x asa series ofLaguerre polynomials L v n (x). Find the coefficients 
by using (a) the orthogonality of L^{x) and (b) the generating function. 

7.34. Derive the recurrence relations given in the book for Jacobi, Gegenbauer, 
and Chebyshev polynomials. 

7*35. Show that T n (—x) = (-l) n T n (x) and U n (—x) = (—l) n U n (x). Hint: Use 
= g(-x } -t). 

7.36. Show that ⑴ = 1 ， U n (l) = w + 1 ， T n (-1) = (-l) n , U n (-l) = 
(-1 广 (《 + 1), 乃以 0) = (-l) m = 〜⑼， and 『加刊 ⑼ = 0 = t/ 2w+1 ⑼. 


Additional Reading 

1 • Dennery, P. and Krzywicki, A. Mathematics for Physicists, Harper and Row, 
1967. Treats the classical orthogonal polynomials in the spirit of this chapter. 

2. Tricomi, F. Vorlesurtgen iiber Orthogonalreihen, Springer, 1955. The origi¬ 
nal unified treatment of the classical orthogonal polynomials. 



Fourier Analysis 



The single most recurring theme of mathematical physics is Fourier analysis. It 
shows up, for example, in classical mechanics and the analysis of normal modes, 
in electromagnetic theory and the frequency analysis of waves, in noise consid¬ 
erations and thermal physics，in quantum theory and the transformation between 
momentum and coordinate representations, and in relativistic quantum field theory 
and creation and annihilation operation formalism. 


8.1 Fourier Series 


One way to begin the study of Fourier series and transforms is to invoke a general¬ 
ization of the Stone-Weierstrass Approximation Theorem (Theorem 5.2.3), which 
established the completeness of monomials, x k . The generalization of Theorem 
5.2.3 permits us to find another set of orthogonal functions in terms of which 
we can expand an arbitrary function. This generalization involves polynomials in 
more than one variable (For a proof of this theorem, see Simmons [Simm 83, pp 
160-161].) 


generalized 

Stone-Weierstrass 

theorem 


8 丄 1_ Theorem* (generalized Stone-Weierstrass theorem) Suppose that f{x\, X 2 , 
■ ■ ■ ,x n ) is continuous in the domain {ai < x\ < biYl_ v Then it can be expanded 

in terms of the monomials x^x^ 2 ■" x\ n , where the hi are nonnegative integers. 


Now let us consider functions that are periodic and investigate their expan¬ 
sion in terms of elementary periodic functions. We use the generalized Stone- 
Weierstrass theorem with two variables，x and y• A function can be written 
as g(x,y) = Ylk!m=o a km^ k y m - In this equation, jc and 3 ^ can be considered as co¬ 
ordinates in the x>-plane，which in turn can be written in terms of polar coordinates 


8.1 FOURIER SERIES 197 


r and & In that case, we obtain 


oo 


/(r, 0) = g(r cos 0， r sin0) = cik m f k+m cos* 0 sin m 0. 

k,m=0 

In particular, if we let r = 1, we obtain a function of 0 alone, which upon substi¬ 
tution of complex exponentials for sin 0 and cos 0 becomes 


OO 


OO 


k,m=0 一 


■W、k 


(20 


m 


(e i0 - e~ i9 ) m ^ b n e ine ， 


n=—oo 


( 8 . 1 ) 


where b n is a constant that depends on The RHS of (8.1) is periodic with 
period thus, it is especially suitable for periodic functions / (0) that satisfy the 
periodicity condition f(6 — 7t) = f(0 +7r). 

We can also write Equation (8.1) as 


00 


f{0) =b 0 + J2 ^ne inB + b- n e_ ine ) 


oo 


bo -H + b-n) cosnO i(b n - b- n ) smnO)] 

*,_1 ' - V—^ ■ ' v . 〆 

=^n 


=A, 


oo 




bo + y^(A n cos nO + B n sin nO). 


( 8 . 2 ) 


If f{0) is real, then bo, A n9 and B n are also real. Equation (8.1) or (8.2) is called 
the Fourier series expansion of f(9). 

Let us now concentrate on the elementary periodic functions e in& . We define 
the such that their “0th components” are given by 


(0\e n ) 




V2n 


e in ^, where 0 G (—7 T, tt). 


These functions — or ket vectors~which belong to tt) ，are orthonormal, 

as can be easily verified. It can also be shown that they are complete. In fact, 
for functions that are continuous on (-tt ， 丌 ）， this is a result of the generalized 
Stone-Weierstrass theorem. It turns out, however, that {1。〉}=^ is also a complete 
orthonormal sequence for piecewise continuous functions on (― ^, 7t)} Therefore, 
any periodic piecewise continuous function of 0 can be expressed as a linear 
combination of these orthonormal vectors. Thus if |/> 4 C 2 (—n, n) 9 then 


oo 


l/)= Y, /» \ e n) ^ where /„ = (e n \ f) 


(8.3) 


n=—cx) 


1 A piecewise continuous function on a finite interval is one that has a finite number of discontinuities in its interval of definition. 



198 8. FOURIER ANALYSIS 


Fourier series 
expansion: angular 
expression 


fundamental cell of a 
periodic function 


“The profound study 
of nature is the most 
fruitful source of 
mathematical 
discoveries, 
Joseph Fourier 


We can write this as a functional relation if we take the 0th component of both 
sides: (0\f) = T ， T=-oofnWe n ),or 


m = E fne ine 

_oo 


(84) 


with f n given by 


fn^{e n \^\f) = {en \(£^ | 0 > {0\ de) \f)= 

i pit 

—inO 


'7t 


{e n \9){0\f) dO 


\p2jt J—7t 


e- ino f(0)d0. 


(8.5) 


It is important to note that even though f(0) may be defined only for —n < 
0 <n. Equation (8.4) extends the domain of definition of f{0) to all the intervals 
(2k — 1)jt <0< (2k + l)?r for all A: g Z. Thus, if a function is to be represented 
by Equation (8.4) without any specification of the interval of definition, it must be 
periodic in 6, For such functions, the interval of their definition can be translated 
by a factor of 2 jt. Thus, f(0) with —it < 0 < it is equivalent to f(0 — 2mn) with 
2mn —n <0 < 2mn + jt; both will give the same Fourier series expansion. We 
shall define periodic functions in their fundamental cell such as (_ 丌，丌 ). 


Joseph Fourier (1768-1830) did very well as a young student 
of mathematics but had set his heart on becoming an army 
officer. Denied a commission because he was the son of a 
tailor, he went to a Benedictine school with the hope that he 
could continue studying mathematics at its seminary in Paris. 

The French Revolution changed those plans and set the stage 
for many of the personal circumstances of Fourier’s later years, 
due in part to his courageous defense of some of its victims, 
an action that led to his arrest in 1794. He was released later 
that year, and he enrolled as a student in the Ecole Normale, 
which opened and closed within a year. His performance there, 
however, was enough to earn him a position as assistant lecturer (under Lagrange and Monge) 
in the Ecole Polytechnique. He was an excellent mathematical physicist, was a friend of 
Napoleon (so far as such people have friends), and accompanied him in 1798 to Egypt, where 
Fourier held various diplomatic and administrative posts while also conducting research. 
Napoleon took note of his accomplishments and, on Fourier’s return to France in 1801, 
appointed him prefect of the district of Isere, in southeastern France，and in this capacity 
built the first real road from Grenoble to Tbrin. He also befriended the boy Champollion, 
who later deciphered the Rosetta stone as the first long step toward understanding the 
hieroglyphic writing of the ancient Egyptians. 

Like other scientists of his time, Fourier took up the flow of heat. The flow was of 
interest as a practical problem in the handling of metals in industry and as; a scientific 
problem in attempts to deteraiine the temperature in the interior of the earth, the variation 





8.1 FOURIER SERIES 199 


Fourier series 
expansion: general 
expression 


of that temperature with time, and other such questions. He submitted a basic paper on 
heat conduction to the Academy of Sciences of Paris in 1807. The paper was judged by 
Lagrange, Laplace, and Legendre. The paper was not published, mainly due to the objections 
of Lagrange, who had earlier rejected the use of trigonometric series. But the Academy did 
wish to encourage Fourier to develop his ideas, and so made the problem of the propagation 
of heat the subject of a grand prize to be awarded in 1812. Fourier submitted a revised paper 
in 1811, which was judged by the men already mentioned and others. It won the prize but 
was criticized for its lack of rigor and so was not published at that time in the Mimoires of 
the Academy. 

He developed a mastery of clear notation, some of which is still in use today. (The mod¬ 
em integral sign and the placement of the limits of integration near its top and bottom were 
introduced by Fourier.) It was also his habit to maintain close association between mathe¬ 
matical relations and physically measurable quantities, especially in limiting or asymptotic 
cases, even performing some of the experiments himself. He was one of the first to begin 
full incorporation of physical constants into his equations, and made considerable strides 
toward the modem ideas of units and dimensional analysis. 

Fourier continued to work on the subject of heat and, in 1822, published one of the 
classics of mathematics, Thiorie Analytique de la Chaleur, in which he made extensive 
use of the series that now bear his name and incorporated the first part of his 1811 paper 
practically without change. Two years later he became secretary of the Academy and was 
able to have his 1811 paper published in its original form in the Memoires. 

Fourier series were of profound significance in connection with the evolution of the 
concept of a function, the rigorous theory of definite integrals, and the development of 
Hilbert spaces. Fourier claimed that “arbitrary” graphs can be represented by trigonometric 
series and should therefore be treated as legitimate functions, and it came as a shock to many 
that he turned out to be right. The classical definition of the definite integral due to Riemann 
was first given in his fundamental paper of 1854 on the subject of Fourier series. Hilbert 
thought of a function as represented by an infinite sequence, the Fourier coefficients of the 
function. 

Fourier himself is one of the fortunate few: his name has become rooted in all civilized 
languages as an adjective that is well-known to physical scientists and mathematicians in 
every part of the world. 


Functions are not always defined on (—tt, tt). Let us consider a function F(x) 
that is defined on (a, b) and is periodic with period L = b — a. We define a new 
variable, 


0 


2it 

~L 



L ^ L 

^ x ^2^ 0+a+ r 


and note that f(0) = F((L/2n)0 +a + L/2) has period (-tt, tt) because 
/( 沒士 tt ) = F (蠢 (0 士 tt ) + a + 土 I ) 


and F(x + L/2) = F[x — L/2). If follows that we can expand the latter as in 
Equation (8.4). Using that equation, but writing 0 in terms of x, we obtain 






200 8. FOURIER ANALYSIS 


Hx) = F it e + a + \)^^ J /, exp [^(^«-|)； 

v W^—^OO 


00 


-==V F n e 2n7tix < L 
4 L W =—00 


( 8 . 6 ) 


where we have introduced 2 F n = \fLfhtf n e- l 、 2nn ’ L 、、 a + L ’ 2 、. Using Equation 
(8.5), we can write 


F n 


丄 

2n 


e 


i(2nn/L)(a^L/2) 


fc 7T 


V2^ 


e- ine f(0)dG 


■7T 


备 - 



i(27r«/L)(fl+L/2) f a+L e -i(2nn/L)(x-a-L/2) dx 

Ja L 

b 

e- i{2ltn/L)x F{x)dx. 


(8.7) 


The functions exp(27r/nx/L)/VT are easily seen to be orthonormal as mem¬ 
bers of b). We can introduce {kn)}^，! with the “xth component” given by 
(a ：| e n ) = ( \/yfL) inx ! L . Then the reader may check that Equations (8.6) and 
(8.7) can be written as [F) = Y^L-oo F n\^n) with F n — {n\F), 

8.1.2. Example. In the study of electrical circuits, periodic voltage signals of different 
square wave voltage shapes are encountered. An example is 汪 square wave voltage of height Uq, “duration” 7", 

and “rest duration” r [see Figure 8.1(a)], The potential as a function of time V(t) can be 
expanded as a Fourier series. The interval is (0, 2T) because that is one whole cycle of the 
potential variation. We therefore use Equation (8.6) and write 

V(t)^-= £ V n e 2n7llt/2T t where V n =-= I e' lnint l 2T V(t)dt. 

"V n=—oo y/'lT JO 

The problem is to find V n . This is easily done by substituting 


V{t) 


L 

0 


U 0 ifO<t<T, 
if T <t<2T 


in the last integral: 

Uo f T 




itl7rt / T dt 


Uo 


T 


y/2T Jo —_ ^Jrf ^ irnt 

0 if n is even and /2 一 0 ， 

V2TU 0 


)[(—l) rt — 1] where w / 0 


mn 


if« is odd. 


( The F n are defined such that what they multiply in the expansion are orthonormal in the interval (a, b). 



8.1 FOURIER SERIES 201 


sawtooth voltage 


⑻ 


(b) 


-2T -T 0 T 2T 

time 

Figure 8.1 (a) The periodic square wave potential, (b) Various approximations to the 

Fourier series of the square-wave potential. The dashed plot is that of the first term of the 
series，the thick grey plot keeps 3 terms，and the solid plot 15 terms. 



For « — 0, we obtain Vo 
we can write 


y/2T Jo 


27 V(t)dt= -L= f T Uadt = Un'lZ. Therefore, 


vsfio Uodt h. 


V(t) 


1 


y/TT 




+ 


^/2TU 0 


in 




J-rntt/T 


n=—oo 
\ n odd 


n 


oo 

E 

n=l 
n odd 


Amtt/T 


n 


2 + ^ 


OO 


1 2 ^ 1 


Y J- e -inirt/T ^ ^ L^nntjT 

‘ — n «=1 n 
n odd 

[2k + Y\nt 


rt=l 
n odd 


sin 


T 


Figure 8.1(b) shows the graphical representation of the above sum when only a finite number 
of terms are present. ® 


8.1.3« Example. Another frequently used voltage is the sawtooth voltage [see Fig¬ 
ure 8.2 ⑻]. The equation for V(f) with period T is V(t) = U^t/T for 0 < / < T, 
and its Fourier representation is 


1 oo 

V(0=-t= £ V n e W ’ r ， 

n = _QO 


where V n = f e~ 2nint/T V(t)dt, 

Vf Jo 


























202 8. FOURIER ANALYSIS 


Substituting for V (t) in the integral above yields 



■2nint/T 


Uo-dt^UoT 


Tt _ 


u 0 r 


■ 3/2 


—ilnn 

T 2 

—i2nn 


InimfT 


T 


■ 3/2 


T 


T 


■2nint/T tdt 


r 


-Inintf 


l0 Jo 

U 0 Vf 


k) 


T 


ilrm 
T 


where n ^0, 


Vo = 

Thus, 

V(t) 


Vf Jo 


V(t) dt = L Uq— dt = jUqVt. 


Vt 

Uol 


l ,UoVf-^[ J ： -e 


ilnnt/T 


Hit 


oo 


E 


pnnt/T 


n=—oo 

1 1 , /2w7Tif \ 

— > — sin I - 

it \ T ) 


n: 


n 


2 


Figure 8.2(b) shows the graphical representation of the above series keeping the first few 
terms. g| 

The foregoing examples indicate an important fact about Fourier series. At 
points of discontinuity (for example, f = T in the preceding two examples), the 
value of the function is not defined, but the Fourier series expansion assigns it a 
value — the average of the two values on the right and left of the discontinuity. For 
instance, when we substitute / = 7 1 in the series of Example 8.1.3, all the sine 
terms vanish and we obtain V(T) = C/o/2, the average of Uo (on the left) and 0 
(on the right). We express this as 

V(T)= ， \[V(T - 0) + V(T +0)] = ilim [V(T - 6) + V(7+^]. 


This is a general property of Fourier series. In fact, the main theorem of Fourier 
series, which follows, incorporates this property. (Fora proof of this theorem, see 
[Cour 62].) 

8.1.4. Theorem. The Fourier series of a function f{6) that is piecewise continuous 
in the interval (— tt, tt) converges to 


- 0)] for - 7t < $ < tt, 

圣 [/( 兀 ） + /( — 兀 )] for G = ±7t. 

Although we used exponential functions to find the Fourier expansion of the 
two examples above, it is more convenient to start with the trigonometric series 



8.1 FOURIER SERIES 2 的 



Figure 8.2 (a) The periodic saw-tooth potential, (b) Various approximations to the Fourier 
series of the sawtooth potential. The dashed plot is that of the first term of the series, the 
thick grey plot keeps 3 terms, and the solid plot 15 terms. 


when the expansion of a real function is sought. Equation (8.2) already gives such 
an expansion. All we need to do now is find expressions ior A n and B n . From the 
definitions of A n and the relation between b n and f n we get 


= H~ b— n 




sfhz 

P7t 


(fn + f-t 




e- in6 f(B)dB + 


■IT 




'it 


-n 


e in6 f{0)de) 


^JJe- in 6 +e inB ]mdO 


'丌 


7T 


cos n0f(6) dO. 


( 8 . 8 ) 


Similarly, 

1 f 71 

B n = - smn0f(9)d0, 

TT « /一丌 

bo = i /o = 去 £ 三 (8 - 9) 

So, for a function f(6) defined in (_ 丌， jr), the Fourier trigonometric series is as 
in Equation (8.2) with the coefficients given by Equations (8.8) and (8.9). For a 
function F(x), defined on (a, b) 9 the trigonometric series becomes 


1 00 y 

FM - -Aq + ^2 \A n cos 


Iruzx n . 2mtx\ 

一 r * +5nSm —^)， 


( 8 . 10 ) 


































































.FOURIER ANALYSIS 


where 


2nnx 


LJa 


IriTzx 


) F{x) dx. 
) F(x)dx. 


( 8 . 11 ) 


A convenient rule to remember is that for even (odd) functions 一 which are 
necessarily defined on a symmetric interval around the origin—only cosine (sine) 
terms appear in the Fourier expansion. 

8.1.5. Example, An alternating current is turned into a direct current by starting with 
a signal of the form V(r) a | sina)f|, i.e” a harmonic function that is never negative, as 
shown in Figure 8.3(a). Then by proper electronics, one smooths out the “bumps” so that 
the output signal is very nearly a direct voltage. Let us Fourier-analyze the above signal. 
Since V (t) is even for -it < cot < tt, we expect only cosine terms to be present. If for the 
time being we use 0 instead of cot,v/c can write | smO\ = 士 Aq + cos nO, where 


— I I sin^l= — / 
兀 J-jt ^ JO 


sindcosnd dO 


2 C 71 ! , 2 「（一 1广 + 1~| 

=——/ A[sin(n + 1)0 — sin(w — 1 )0] d6 = — » -- 

7T Jq z — iltt 」 

f 4 / 1 \ 

—— ( - ) for « even and w ^ 0, 

=■ 7t — 1 / 

0 for a odd, 

and Aq = (1/tt) \ sin0| dO = 4/ 丌 . The expansion then yields 


I sin cot\ 


cos 2ka)t 


兀 苁 — 1 


where in the sum we substituted 2k for «, and cot for 0. Figure 8.3(b) shows the graph of 
the series above when only the first few terms are kept. ■ 

It is useful to have a representation of the Dirac delta function in terms of 
the present orthonormal basis of Fourier expansion. First we note that we can 
represent the delta function in terms of a series in any set of orthonormal functions 
(see Problem 8.23): 


S(x-x f ) = ^2f n (x)f*(x f )w(x). 


( 8 . 12 ) 


Next we use the basis of the Fourier expansion for which w(x) = 1. We then 
obtain 


8(x — x f ) 


oo e 2jiinx/L e —2v:inx'IL i QQ 


.Ininix—x^/L 





8.1 FOURIER SERIES 205 


Gibbs phenomenon 



Figure 8.3 (a) The periodic “monopolar^’ sine potential, (b) Various approximations to 

the Fourier series of the “monopolar*’ sine potential. The dashed plot is that of the first term 
of the series, the thick grey plot keeps 3 terms, and the solid plot 15 terms. 


8*1.1 The Gibbs Phenomenon 


The plot of the Fourier series expansions in Figures 8.1(b) and 8.2(b) exhibit a 
feature that is common to all such expansions: At the discontinuity of the periodic 
function, the truncated Fourier series overestimates the actual function. This is 
called the Gibbs phenomenon, and is the subject of this subsection. 

Let us approximate the infinite series with a finite sum. Then 


N 


f N (e) 


担 n 




inO 


■N 


^ n 


N 

E 


e 


inO 


1 


►27T 


■N 


VZTT JO 


e— ine, f(G f )d0 f 


2n 


■2tt N 

de , f(6 , ) e in(d ~ e， \ 

’ n=—N 


where we substituted Equation (8.5) in the sum and, without loss of generality, 
changed the interval of integration from ( 一宂， 7 r) to (0, 2n). Problem 8.2 shows 
that the sum in the last equation is 


N 




sin[(N + ^)(0 - d f )] 

~sin [ 如 — 6K)] 


It follows that 


Jo —W)] 







































































206 8. FOURIER ANALYSIS 


maximum overshoot 
in Gibbs 
phenomenon 
calculated 


_ 2 tt —0 


2 tt 




sin[(iV + \)(j>] 




sin (去 0) 2 jt 


d<pf(4> + e)S((i>y 

(8.13) 




We want to investigate the behavior of//v at a discontinuity of /.By translating 
the limits of integration if necessary, we can assume that the discontinuity of 
f occurs at a point a such that 0 ^ a ^ 2 tt, Let us denote the jump at this 
discontinuity for the function itself by A/, and for its finite Fourier sum by Afy ： 

Af = f(a + 6 ) - f(a -€), A/jv = /at(« + €)- - 

Then, we have 


A/n 


lit 


2n 


d ♦ 得 + a-e)S ( 中 ) 


2 tt 


lit 


lit 


27i—a—€ 1 疒 27T_a 十 e 

d4>f{<S> + a + 6)5(0) - — / 

■a—€ [沉 J-Ot-h€ 

p-OL-\-€ /*2ir—a—€ 

I ^0/(^ + Qf+ e)5(^) + / d<f>f(<p-\-a + €)3((/)') 

'—a—e J-a-\-e 

/ 2jr—a—€ p27t-a-\-€ 

d(j)f (0 + a — €)S(</)) H- / d(j)f {<j> -\- ot — ^)S{(p) 

-a+e Jlit—a—e 

p—a+c 尸 2jt — a+e 

/ 却 /(0 + a + 6)5(0) - / d(j>f((p + of - 6)5(0) 

a—e J2n—a—€ 

pln—a—e 

f d<j>[f{(j) + a + 在 ）- /(0 + a - 6)]5(0) 

—a+e 


The first two integrals give zero because of the small ranges of integration and the 
continuity of the integrands in those intervals. The integrand of the third integral is 
almost zero for all values of the range of integration except when 0 0. Hence, we 
can confine the integration to the small interval (—5, +5) for which the difference 
in the square brackets is simply A/. It now follows that 


A/jv ⑹& 


Af f s sm[(N + |)U/ 〆 sin[(iV + \)4>\ 


2tt 


sm(U) d ^~ 兀 h 


y> 


d(j>. 


where we have emphasized the dependence of fy on S and approximated the sine 
in the denominator by its argument, a good approximation due to the smallness 
of 0. The reader may find the plot of the integrand in Figure 6.2, where it is 
shown clearly that the major contribution to the integral comes from the interval 
[0, 7t/(N-\-^)], where 冗八况 + 妻 ） is the first zero of the integrand. Furthermore, it 
is clear that if the upper limit is larger than tt/(]V + 士 )， the result of the integral will 
decrease, because in each interval of length 2 jt, the area below the horizontal axis 
is larger than that above. Therefore, if we are interested in the maximum overshoot 


8.1 FOURIER SERIES 207 


of the finite sum, we must set the upper limit equal ton/(N follows firstly 

that the maximum overshoot of the finite sum occurs at7t/(N + % n/N to the 

right of the discontinuity. Secondly, the amount of the maximum overshoot is 

=X 

“ ^ 、〜 2A / r !{N+ ^ sin[(AM4)J] _ 

(△/iv)max 〜 / ■:- d(j) 

^ JO 0 

2 t n sinx 

=-△/ / —— dx ^ 1.179A/. (8.14) 

7T Jq X 


Thus 


8.1.6. Box. (Gibbs phenomenon) The finite (large-N) sum approximation 
of the discontinuous function overshoots the function itself at a discontinuity 
by about 18 percent. 


8.1.2 Fourier Series in Higher Dimensions 

It is instructive to generalize the Fourier series to more than one dimension. This 
generalization is especially useMin crystallography and solid-state physics, which 
deal with three-dimensional periodic structures. To generalize to N dimensions, 
we first consider a special case in which an iV-dimensional periodic function is a 
product of N one-dimensional periodic functions. That is, we take the TV functions 


OO 


f {j) [x) = -= ^ fO) e 2i7rkx/Lj 9 
\ L J k=-OQ 

and multiply them on both sides to obtain 


1 , 2 ,… ， iV, 


F( r ) = / ⑴⑹ /( 2 ) ⑻ … f iN) (x N ) = 士 F k e^ 

W k 

where we have used the following new notations: 


(8.15) 


^(r) = / (1) 0ci)/ (2) ⑻ … f (N \x N ), 
k e (灸 i ，灸 2， • • • ， 


V — L\L2 ■ - ■ Ln, 
FkEfffkN ， 

T — (^X \, X2 ，•- . ， ^jv)» 


We take Equation (8.15) as the definition of the Fourier series for any periodic 
function of N variables (not just the product of N functions of a single variable). 
However, application of (8.15) requires some clarification. In one dimension, the 



208 8. FOURIER ANALYSIS 


Wigner-Seitz cell 


reciprocal lattice 
vectors 


shape of the smallest region of periodicity is unique. It is simply a line segment of 
length L, for example. In two and more dimensions, however, such regions may 
have a variety of shapes. For instance, in two dimensions, they can be rectangles, 
pentagons, hexagons, and so forth. Thus, we let V in Equation (8.15) stand for 
a primitive cell of the iV-dimensional lattice. This cell is important in solid-state 
physics, and (in three dimensions) is called the Wigner-Seitz cell. 

It is customary to absorb the factor \j\fV into Ft, and write 

F(r) - Y F k e Igk r 分 F k = ^ f d N x, ( 8 . 16 ) 

k V Jv 

where the integral is over a single Wigner-Seitz cell. 

Recall that F(r) is a periodic function of r. This means that when r is changed 
by R, where R is a vector describing the boundaries of a cell, then we should 
get the same function: F(r + R) = F(r). When substituted in (8.16)，this yields 
F(r + R) = X； k F k ^^ k (r+R) ^ Ek which is equal to F(r) if 

^k-R = L (8.17) 


In three dimensions R = + + mg %， where mi, m2, and m3 are 

integers and ai, a2, and 83 are crystal axes, which are not generally orthogonal. 
On the other hand, gt = n\b\ 4 - + where n\,n 2 , and W3 are integers, 

and bi ， andb3 are the reciprocal lattice vectors defined by 


27r(a 2 x a 3 ) 匕 _ 2 ;r(a 3 x ap 〜 — 2?r(ai x a 2 ) 
ai. (a 2 x a 3 ) ’ a! • (a 2 x a〗）’ ai - (a 2 x a 3 ) 


The reader may verify that b/ • = 2 丌 <5" • Thus 

gk R= ^ 


V=l 


hj 


3 


2 jt = 2 tt (integer). 


and Equation (8.17) is satisfied. 


8.2 The Fourier Transform 

The Fourier series representation of F (x) is valid for the entire real line as long as 
F{x) is periodic. However— most functions encountered in physical applications 
are defined in some interval (a, b) without repetition beyond that interval. It would 
be useful if we could also expand such functions in some form of Fourier “series.” 

One way to do this is to start with the periodic series and then let the period 
go to infinity while extending the domain of the definition of the function. As a 


.2 THE FOURIER TRANSFORM 




Figure 8.4 (a) The function we want to represent, (b) The Fourier series representation 

of the function. 


specific case, suppose we are interested in representing a function f{x) that is 
defined only for the interval (a, b) and is assigned the value zero everywhere else 
[see Figure 8.4(a)]. To begin with, we might try the Fourier series representation, 
but this will produce a repetition of our function. This situation is depicted in 
Figure 8.4(b). 

Next we may try a function gA (^) defined in the interval (a — A/2, b + A/2), 
where A is an arbitrary positive number: 

0 if a — A/2 < x < a, 
gA(x) = f(x) if a <x <b, 

0 if b<x <b-\- A/2. 

This function, which is depicted in Figure 8.5, has the Fourier series representation 

_) = -J=L== T g A ， W /(L+A) ， （ 8.18) 

v^ + A n t^o 


where 


SA,n = 



rb-bA/2 

A-A/2 


e ~2innx/(L+A) gA(x)dx 


(8.19) 


We have managed to separate various copies of the original periodic function 
by A. It should be clear that if A ^ oo, we can completely isolate the function 




210 8. FOURIER ANALYSIS 


Fourier integral 
transforms 


and stop the repetition. Let us investigate the behavior of Equations (8.18) and 
(8.19) as A grows without bound. First, we notice that the quantity k n defined by 
k n = 2nn/(L + A) and appearing in the exponent becomes almost continuous. 
In other words, as n changes by one unit， ‘ changes only slightly. This suggests 
that the terms in the sum in Equation (B.18) can be lumped together in j intervals 


of width Arij, giving 

§A(kj) 


00 


裒 A 00 a X) 


■00 


VX + A 



Awj, 


where kj = 2rtj7t/(L + A), and gA(kj) = gA ， n r Substituting Anj = [(L -h 
A)/27t]Akj in die above sum, we obtain 


00 


8 A (kj) 


■L + A 


00 




where we introduced gA(kj) defined by gAi^j) = VCE + AJ/St gA(^y) - It is 
now clear that the preceding sum approaches an integral in the limit that A ^ oo. 
In the same limit, gA W — fix), and we have 

fix) = -^=r f(k)e ikx dk ， (8.20) 

v2tt J—oo 

where 


f(k) = lim gA(kj) 
A ^-oo 


lim 

A—oo 


L +A 


lim 

A-»cx> 




L +A 1 


In 

rb+A/2 


gA(kj) 


2tt vL + A Ja—A/2 


e 一郎 gA(x)dx 


•00 


f{x)e~ 


ikx dx. 


( 8 . 21 ) 


Equations (8.20) and (8.21) are called the Fourier integral transforms of f(k) 
and /(x), respectively. 

8«2.1» Example. Let us evaluate the Fourier transform of the function defined by 



if |x| < a, 
if \x\ > a 


(see Figure 8.6). From (8.21) we have 

f(k) = 士「 me'^dx = 去「 e'^dx = ^ 

v J—oo v 2jt J—a \f2it 

which is the function encountered (and depicted) in Example 6.1.2. 


sinka\ 
ka ) 


8.2 THE FOURIERTRANSFORM 



Figure 8.5 By introducing the parameter A, we have managed to separate the copies of the function. 


Let us discuss this result in detail. First, note that ifcz oo, the function f(x) becomes 
a constant function over the entire real line, and we get 


m 




sinfea 




7i 8 (k) 


by the result of Example 6.1.2. This is the Fourier transform of an eveiywhere-constant 
function (see Problem 8.12). Next, lotb oo and a Oin such a way that 2ab ，which is 
the area under /(x), is 1. Then f(x) will approach the delta function, and / (k) becomes 


m 


=lim 
b-^oo 
a-^0 


lab smka 


_ lim 

v2tt a—0 


smka 




So the Fourier transform of the delta function is the constant 

Finally, we note that the width of / (x) is Ax = 2a, and the width of / (k) is roughly 
the distance, on the Zr-axis, between its first two roots, k+ and k- 9 on either side of = 0: 
Ak = = litja. Thus increasing the width of f(x) results in a decrease in the width 

of f(k). In other words, when the function is wide, its Fourier transform is narrow. In the 
limit of infinite width (a constant function), we get infinite sharpness (the delta fimction). 
The last two statements are very general. In fact, it can be shown that AxAk > 1 for any 
function f(x). When both sides of this inequality are multiplied by the (reduced) Planck 
Heisenberg constant h = h/ (2^), the result is the celebrated Heisenberg uncertainty relation: 3 
uncertainty relation 

AxAp > h. 


where /> = M is the momentum of the particle. Having obtained the transform of f(x), we 
can write 

„ b sinka ： u Y „ 

dk = — f —-— e tkx dk, 圃 

兀 J—oo k 


fM = 


i /一 


2b smka ^ 

Vlir k 


3 In the context of the uncertainty relation, the width of the function — the so-called wave packet~measures the uncertainty 
in the position jc of a quantum mechanical particle. Similarly, the width of the Fourier transfonn measures the uncertainty in k, 
which is related to momentum p via /?= 缺， 


Figure 8.6 The square “bump” function. 


8.2.2. Example. Let us evaluate the Fourier transform of a Gaussian g(x) = ae 
with a,b > 0: 


—bx 2 


m = 



y/ln 


k 2, /4b poo 


r b{xMk/2b) 1 dXt 


To evaluate this integral rigorously, we would have to use techniques developed in complex 
analysis, which are not introduced until Chapter 10 (see Example 10.3.8). However, we can 
ignore the fact that the exponent is complex, substitute y = x-b ik/(2b), and write 


r b[xMk/{m 2 dx 


e~ by dy 


n- 


Thus, we have g(k) = _^±^ 一 * 2 /( 刪， which is also a Gaussian. 

\2b ■ 

We note again that the width of g{x), which is proportional to l/y/b, is in inverse 
relation to the width of g(k), which is proportional to VS. We thus have AxM 〜 h ■ 

Equations (8.20) and (8.21) are reciprocals of one another. However, it is not 
obvious that they are consistent. In other words, if we substitute (8.20) in the RHS 
of (8.21), do we get an identity? Let’s try this: 


m 


4=[ 

V2jt J- 


dxe 


i 广 00 , - 

[w,L me dk， - 


f{k!)e i{k ， ~ k)x dk l . 


We now change the order of the two integrations: 


/ OO 「1 fOO 

dk f m -i- / ■ 

-oo J—oo 」 

But the expression in the square brackets is the delta function (see Example 6.1.2). 
Thus, we have /(it) = / 二 dk f f (k f )8(t - k), which is an identity. 


8.2 THE FOURIER TRANSFORM 213 


As in the case of Fourier series, Equations (8.20) and (8.21) are valid even if 
f and / are piecewise continuous. In that case the Fourier transforms are written 
as 


^U(x + 0) + f(x-0)] 


1 


j oo 


f(k)e ikx dk. 


y/lLTl J—oo 

[f(k + 0 ) + /(*- 0 )] = ^= r f(x)e~ ikx dx, 

\JjLiz J —oo 


( 8 . 22 ) 


where each zero on the LHS is an e that has gone to its limit. 

It is useful to generalize Fourier transform equations to more than one dimen¬ 
sion. The generalization is straightforward: 


/ ⑻ / W/ 00 , 


m 


(27C) n / 2 


d n xf(r)e' 


■ikr 


(8.23) 


Let us now use the abstract notation of Chapter 6 to get more insight into the 
preceding results. In the language of Chapter 6, Equation (8.20) can be written as 


► 00 


•00 


• (x\f) 


(k\f) (x\k)dk={x\ 


\k) (k\ dk) |/>, 


(8.24) 


J—oo 

where we have defined 


■oo 


(调 


\Z2tt 


e 


ikx 


(8.25) 


Equation (8.24) suggests the identification \ f) = \f) as well as the identity 


'00 


I*) (A:| dk. 


(8.26) 


oo 


which is the same as (6.1). Equation (6.3) yields 
{k\k , ) = S(k-k f ), 


(8.27) 


which upon the insertion of a unit operator gives an integral representation of the 
delta function: 


5(* - k f ) = (Jfc| 1 |it'> = (k\ 


■00 


w (x\dx) IO 


■oo 


{k\x) (x\k f ) dx = f dxe l ( k ’— k ) x . 

Lit 


■oo 


■oo 


Obviously, we can also write 8(x —x r ) = [1/(2jt)] dke i( ^~ x， ^ k . 


214 8. FOURIER ANALYSIS 


Yukawa potential 


If more than one dimension is involved, we use 


S(k - k 7 ) 
<5(r -〆) 



(2jtr 

with the inner product relations 


d n xe i(k-k f yr^ 

心 /(!*-〆 ) 气 


(8.28) 


(r\k) 


Jk-r 


(2 tt )«/ 2 


(k|r) 


(2;rW 2 


€ 


■ik*r 


(8.29) 


Equations (8.28) and (8.29) and the identification |/> = |/> exhibit a striking 
resemblance between |r) and |k>. In fact，any given abstract vector |/> can be 
expressed either in terms of its r representation ，〈 r| /〉 = /(r), or in terms of 
its k representation, (k| /> 三 /(k). These two representations are completely 
equivalent, and there is a one-to-one correspondence between the two, given by 
Equation (8.23). The representation that is used in practice is dictated by the 
physical application. In quantum mechanics, for instance, most of the time the r 
representation, corresponding to the position, is used，because then the operator 
equations turn into differential equations that are normally linear and easier to 
solve than the corresponding equations in the k representation, which is related to 
the momentum. 

8.2.3. Example. In this example we evaluate the Fourier transform of the Coulomb poten¬ 
tial V(r) of a point charge q \ V(r) = q/r. The Fourier transform is important in scattering 
experiments with atoms, molecules, and solids. As we shall see in the following, the Fourier 
transform of V (r) is not defined. However, if we work with the Yukawa potential, 


V a (r) 


qe 


-ar 


a > 0, 


the Fourier transform will be well-defined, and we can take the limit a —> Oto recover the 
Coulomb potential. Thus, we seek the Fourier transform of V a (r). 

We are working in three dimensions and therefore may write 


' 卜 ㈣ 


-ar 


It is clear from the presence of r that spherical coordinates are appropriate. We are free 
to pick any direction as the z-axis. A simplifying choice in this case is the direction of k. 
So, we let k = |k[e z = ke z , or k • r = kr cos 0 ， where 0 is the polar angle in spherical 
coordinates. Now we have 


V«(k) 




■oo ^ rn fin p—ctr 

r 2 dr I sinOde / d<pe~ ikrcose -— 


(27T) 3 / 2 Jq JQ JQ r 

The (p integration is trivial and gives 2jt. The 6 integration is done next: 


、 7T 


sin^e' 


■ikr cos 6 


dG 


e~ ikril du 


ikr 


(e 


ikr 


e~ ikr ). 


8.2 THE FOURIER TRANSFORM 215 


We thus have 


V a (k) 


今 （ 2tt) 

(2k) 3 / 2 Jo 

q 1 r 

(2jt)V 2 ik Jo 


°° 1 


r ikr 


(e 


ikr 


e~ ikr ) 


t/r ^ e ( - a + ik )r _ e -(of+ife)rj 


<1 


(2t)V2 ik 




(—a+ik)r 


+ ik 


oo 


0 


,-(a-\-ik)r 


a -\-ik 


y 


Note how the factor e~ ar has tamed the divergent behavior of the exponential at r —>* oo. 
This was the reason for introducing it in the first place. Simplifying the last expression 
yields V a (k) = (2q/y/2jT)(k 2 + a 2 ) -1 . The parameter a is a measure of the range of the 
potential. It is clear that the larger a is, the smaller the range. In fact, it was in response to 
the short range of nuclear forces that Yukawa introduced a. For electromagnetism, where 
the range is infinite, a becomes zero and (r) reduces to V (r). Thus, the Fourier transform 
of the Coulomb potential is 

Vct>ul(k) = 备 

If a charge is involved, the Fourier transform will be different. B 

8.2.4. Example* The example above deals with the electrostatic potential of a point 
charge. Let us now consider the case where the charge is distributed over a finite volume. 
Then the potential is 




where qp{v f ) is the charge density at r' and we have used a single integral because d^x f 
already indicates the number of integrations to be performed. Note that we have normalized 
p (〆）so that its integral over the volume is 1. Figure 8.7 shows the geometry of the situation. 

Making a change of variables, R = r / — r, or r 7 = R + r, and d?x f = d?X, with 
R = (Z, 7, Z), we get 

1 C -3^ -ik r f P( R + r ) 


V(k) 


d^xe 




d 3 X. 


(8.30) 


(2tt)3/2 J - ~ -J R 

To evaluate Equation (8.30), we substitute for p(R + r) in terms of its Fourier transform ， 

1 --- - (8,31) 


P (R + r) = T^W/ 帅 V k/(R + r) . 


(2jt) 3 / 2 

Combining (8.30) and (8.31)，we obtain 

d 3 X dh fe -^ 


V ⑻ 


( 2 ^ 


列 kV_ r(k ’ -k) 




9 


f d3x 


S(k f -k) 


qp(k) f d 3 X 


e 


ikR 


R 


(8.32) 


216 8. FOURIER ANALYSIS 


form factor 


Fourier transform 
and the discovery of 
quarks 



Figure 8.7 The Fourier transform of the potential of a continuous charge distribution at 
P is calculated using this geometry. 


What is nice about this result is that the contribution of the charge distribution, p(k), has 
been completely factored out. The integral, aside from a constant and a change in the sign 
of k, is simply the Fourier transform of the Coulomb potential of a point charge obtained 
in the previous example. We can therefore write Equation (8.32) as 

V(k) = (2 丌 ) 3 〜峽 oui (- k) = U k) • 

|kH 

This equation is important in analyzing the structure of atomic particles. The Fourier 
transform V (k) is directly measurable in scattering experiments. In a typical experiment 
a (charged) target is probed with a charged point particle (electron). If the analysis of the 
scattering data shows a deviation from 1/A: 2 in the behavior of V (k), then it can be concluded 
that the target particle has a charge distribution. More specifically, a plot of k 2 V (k) versus k 
gives the variation of p(k), the form factor, with k. If the resulting graph is a constant, then 
p(k) is a constant, and the target is a point particle [p(k) is a constant for point particles, 
where p(〆）oc 5(r — 〆)]■ If there is any deviation from a constant function, p(k) must have 
a dependence on k, and correspondingly, the target particle must have a charge distribution. 

The above discussion, when generalized to four-dimensional relativistic space-time, 
was the basis for a strong argument in favor of the existence of point-like particles 一 
quarks — inside a proton in 1968, when the results of the scattering of high-energy electrons 
off protons at the Stanford Linear Accelerator Center revealed deviation from a constant 
for the proton form factor. 面 

8.2.1 Fourier Transforms and Derivatives 

The Fourier transform is very useful for solving differential equations. This is 
because the derivative operator in r space turns into ordinary multiplication in k 


8.2 THE FOURIER TRANSFORM 217 


space. For example, if we differentiate /(r) in Equation (8.23) with respect to 〜， 
we obtain 

4 /(r) = 1^1 ^ 忐 w_+ ㈣ ―初 ’⑻ 

= (^72 / 

That is, every time we differentiate with respect to any component of r, the cor¬ 
responding component of k “comes down.” Thus, the w-dimensional gradient 
is V/(r) = (2 丌 )- 矜 ’ 2 f d n k(ik)e lk r f(k), and the ” -dimensional Laplacian is 
V 2 /(r) = ( 2^)-«/ 2 / d n k(-k 2 )e ikT f(k). 

We shall use Fourier transforms extensively in solving differential equations 
later in the book. Here, we can illustrate the above points with a simple example. 
Consider the ordinary second-order differential equation 

C 2 0H-Ci£ + C o y = /W, 

where Co, C\, and C 2 are constants. We can “solve” this equation by simply 
substituting the following in it: 


yM = 

y/2^ 1 

dky(k)e lkx , — = 

dx 

y/27T j 

h 

dky(k){ik)e ikx , 

d 2 y 

1 

j dky(k)k 2 e lkx , f(x)= 

丄 / 

1 

dkf(k)e ikx . 

办 2 — 

V2 tt 

yphi J 

This gives 




• 

\Jljz j 

m 

dky(k)(-C 2 k 2 + iC x k + C 0 )e ikx = 

; f 

J 

dkf(k)e lkx . 

Equating the coefficients of e ikx on both sides, we obtain 


y(k )= 


m 



-C 2 k 2 

+ iC\k + Cq 




If we know / (k) [which can be obtained from f(x)], we can calculate y(x) 
by Fourier-transforming j(A;). The resulting integrals are not generally easy to 
evaluate. In some cases the methods of complex analysis maybe helpful; in others 
numerical integration maybe the last resort. However, the real power of the Fourier 
transform lies in the formal analysis of differential equations. 

8.2.2 The Discrete Fourier Transform 

The preceding remarks alluded to the power of the Fourier transform in solving 
certain differential equations. If such a solution is combined with numerical tech¬ 
niques, the integrals must be replaced by sums. This is particularly true if our 



218 8. FOURIER ANALYSIS 


discrete Fourier 
transforms 


function is given by a table rather than a mathematical relation, a common fea¬ 
ture of numerical analysis. So suppose that we are given a set of measurements 
performed in equal time intervals of Suppose that the overall period in which 
these measurements are done is 7 1 . We are seeking a Fourier transform of this finite 
set of data. First we write 


/⑽ = 



f(t)e^ ia)t dt ^ 


\fljt 





or, discretizing the frequency as well and writing co m — mAco, with Ao) to be 
determined later, we have 


fimAco) 




M=0 


Q- 


(8.33) 


Since the Fourier transform is given in terms of a finite sum, let us explore the 
idea of writing the inverse transform also as a sum. So, multiply both sides of the 
above equation by [e^ mA(0 ^ kAt /(\/27t)]A(o md sum over m: 


N- 


TAco 


N-1N—1 


Y^f( mAc °y imA0})kAtAa) = — f(nAt)e imAa>At(k ~ n) 
m _Q "ZttN 


TAco 

2tvN 


N-l 


N-l 


J2 /(«A0 e imAcoAt(k - n \ 


m ： 


Problem 8.2 shows that 

N-\ 

' ^imAo)At(k-'n) — 
m =0 


N 

JNAwAt(k-n) 


if k = n, 


i£ k^n. 


We want the sum to vanish when A: ^ n. This suggests demanding that NAcoAt (k — 
«) be an integer multiple of Since Aco and At are to be independent of this 
(arbitrary) integer (as well as k and n), we must write 

T 2tt 

N Aco At (k — n) = 2jt(k n) =>■ N Aco— = 2 tv Aco = y. 

With this choice, we have the following discrete Fourier transforms: 



0)j — 


2nj 


T 


(8.34) 


8.2 THE FOURIER TRANSFORM 219 


fast Fourier 
transform 


where we have redefined the new / to be y/2irN/T times the old /. 

Discrete Fourier transforms are used extensively in numerical calculation of 
problems in which ordinary Fourier transforms are used. For instance, if a dif¬ 
ferential equation lends itself to a solution via the Fourier transform as discussed 
before, then discrete Fourier transforms will give a procedure for finding the so¬ 
lution numerically. Similarly, the frequency analysis of signals is nicely handled 
by discrete Fourier transforms. 

It turns out that discrete Fourier analysis is very intensive computationally. Its 
present status as a popular tool in computational physics is due primarily to a very 
efficient method of calculation known as the fast Fourier transform. In a typical 
Fourier transform, one has to perform a sum of TV terms for every point. Since there 
are N points to transform, the total computational time will be of order N 2 . In the 
fast Fourier transform, one takes N to be even and divides the sum into two other 
sums, one over the even terms and one over the odd terms. Then the computation 
time will be of order 2 x (N/2) 2 , or half the original calculation. Similarly, if 
N/2 is even, one can further divide the odd and even sums by two and obtain a 
computation time of 4 x (A^/4) 2 ,ora quarter of the original calculation. In general, 
if N = 2 k , then by dividing the sums consecutively, we endup with transforms 
to be performed after/: steps. So, the computation time will be kN = Nlog 2 N. 
For N = 128, the computation time will be lOOlog] 128 = 700 as opposed to 
128 2 ^ 16,400, a reduction by a factor of over 20. TTie fast Fourier transform is 
indeed fast! 


8.2.3 The Fourier TVansform of a Distribution 


Although one can define the Fourier transform of a distribution in exact analogy 
to an ordinary function, sometimes it is convenient to define the Fourier transform 
of the distribution as a linear functional. 

Let us ignore the distinction between the two variables x and k 9 and simply 
define the Fourier transform of a function / : R -> R as 


►OO 


/(«) 


y/2n 


f(t)e iut dt. 


■OO 


Now we consider two functions, / and g, and note that 


if^g) 


►oo 

-oo 

-oo 

_00 

-oo 


>00 


f(u)g(u) du 




/(M) 


■00 


*00 


g(f)e- lut dt 


J- 


oo 


du 


•oo 


l v2tt 


f (u)e~ lllt du 


■OO 


dt 


g(t)f(t)dt = (/, g ). 


The following definition is motivated by the last equation. 




220 8. FOURIER ANALYSIS 


8.2.5. Definition. Let <pbea distribution and let f be a ^function whose Fourier 
transform f exists and is also a ^function. Then we define the Fourier transform 
<p of<p to be the distribution given by 

( 孕 ， /〉= <%/>• 


8.2.6. Example. The Fourier transform of <5(x) is given by 

1 


f) = f) = /⑼ 


■s/2tz 


mdt 


-oo 


，00 


■oo 


▲) /( _=( 去 ’ 乃 . 


Thus, S = \j\flK, as expected. 

The Fourier transform of 5(^: —x f ) = is given by 

i 

= = /V) = -^== / 




'—oo 


.00 1 
-oo ^ \ 2 tt 




Thus, if <p(x) = 8(x — x f ) y then (p{t) = (l/y/23t)e~ ix t . 



8.3 Problems 


'oo 

fm=—oo 


8(9 — 2mn). 


8.1. Consider the function f(0) = 

(a) Show that / is periodic of period 2 jt. 

(b) What is the Fourier series expansion for f(0), 

8.2. Break the sum e inie ~ 9，) into + E^= 

sum formula 


.Use the geometric 


N 

E 


■AM~1 


一 1 


ar 


a 


to obtain 

N 


^ e in{e-e f ) = e i(o-e f ) 


e 


iN(e-e r ) 


e 


e l 


i\(N+l)(9-9 f ) sin[^(0 - 6 f )] 

“ sin[i((9 - O f )] ’ 


By changing n to —n or equivalently, (0 — 0 f ) to —(0 — 0 , ) find a similar sum from 
—N to — 1. Now put everything together and use the trigonometric identity 

2cosasinj0 = sm(a + 卢）一 sin(a — 卢） 


to show that 
N 


E 


■N 


inio-o 1 ) = sin[(AT + ^)(0 -O f )] 

sm[l(6-0 f )] 


8.3 PROBLEMS 221 


8.3. Find the Fourier series expansion of the periodic function defined on its fun¬ 
damental cell as 

J 一去 (兀 + 沒 ）^ — 7 T <0 < 0 , 

八 ； ^ \l(7z-0) if 0 < 0 < JT. 

8.4. Show that A n and B n in Equation (8.2) are real when f(0) is real. 

► • 

8.5. Find the Fourier series expansion of the periodic function f{0) defined on its ， 
fundamental cell, (— 7 T ， tv), as f(0) = cosa9, 

(a) when a is an integer. (b) when a is notan integer. 

8 . 6 . Find the Fourier series expansion of the periodic function defined on its fun¬ 
damental cell, (—jt ， 7t), as f(0) = 0. 

8.7. Consider the periodic function that is defined on its fundamental cell, (—a, a), 
as f(x) \x\. 

(a) Find its Fourier series expansion. 

(b) Evaluate both sides of the expansion at a ： = 0, and show that 

00 1 

ir 2 = ^y — 

台(及 +1) 2 

(c) Show that the infinite series gives the same result as the function when both 
are evaluated atx = a. 

8 . 8 . Let f{x) = x be a periodic function defined over the interval (0, 2a). Find 
the Fourier series expansion of /. 

8.9. Show that the piecewise parabolic “approximation” to a 2 sm(nx/a) in the 
interval (—a, a) given by the function 


f(x) = 


4x(a + x) 
4x(a — x) 


if — a < x <0 
if 0 < x < a 


has the Fourier series expansion 

… 32d f 1 
7 


(2n + \)nx 
a 


Plot f(x) 9 a 2 sm(nx/a), and the series expansion (up to 20 terms) for a = 1 
between —1 and +1 on the same graph. 

8.10. Find the Fourier series expansion of f(0) = 0 2 for \0\ < n. Then show that 



TT 2 _ f (- i) n 



222 8. FOURIER ANALYSIS 


8.11. Find the Fourier series expansion of 

〜、 sin cot if 0 < / < tc/o), 

J lO if -n/a)<t< 0. 

8,12_ What is the Fourier transform of 

(a) the constant function f(x) = C, and 

(b) the Dirac delta function 5 (x)? 

8.13. Show that 

⑻ if g(x) is real, then g*(k) = g(—k), and 

(b) if g(jc) is even (odd), then g[k) is also even (odd). 

8.14. Let g c (x) stand for the single function that is nonzero only on a subinterval 
of the fundamental cell (a, a + L). Define the function g(:c) as 

00 

g(x) = ^2 g c (x - jL), 

;=—oo 

(a) Show that 客 (x) is periodic with period L. 

(b) Find its Fourier transform g(k), and verify that 

oo 

g(k)=Lg c (k) ⑽ L-2mjt). 

oo 

(c) Find the (inverse) transform of g(k), and show that it is the Fourier series of 
gc(x )., 

8.15. Evaluate the Fourier transform of 


gM = 


b-b\x\/a 

0 


if |x| < a f 
if |x| > a. 


8.16. Let f(6) be a periodic function given by f(0) = E=-oo a n e in0 , Find its 
Fourier transform f(t). 


8.17. Let 

f(t) = 

[sinfwo/ 

if \t\ < T, 

[o 

if \t\ > T. 

Show that 

/ ⑽ = 

▲ I 

sin[(<w — coo)T] 

CO — 0)Q 


sin[(a)(o 0 )T] 

(O + COQ 


Verify the uncertainty relation AeoAt ^ 4tt. 



8.3 PROBLEMS 223 


convolution theorem 


Parseval’s relation 


8.18. If f(x) = g(x -fa), show that / (k) = e~ iak g{k). 

8.19. For a > 0 find the Fourier transform of f{x) = e~ a ^. Is f{k) symmetric? 
Is it real? Verify the uncertainty relations. 

8.20. The displacement of a damped harmonic oscillator is given by 


m = 


^ e -cct e ia)ot 

0 


if f > 0, 

if 才 < 0. 


Find / (o)) and show that the frequency distribution \f(co)\ 2 is given by 


1/ ⑽ I: 


A 2 


1 


2tt (co — coo) 2 + a 2 
8.21, Prove the convolution theorem: 


_00 




f(x)g(y-x)dx 


f(k)g(k)e iky dk. 


■00 


■00 


What will this give when : y = 0? 

S.22. Prove Parseval’s relation for Fourier transforms: 



f(x)g*(x) dx 



f(k)g*(k)dL 


In particular, the norm of a function~with weight function equal to 1 _is invariant 
under Fourier transform. 


8.23. Use the completeness relation 1 = Yin \ n ) ( n \ an ^ sandwich it between |x> 
and (;t/| to find an expression for the Dirac delta function in terms of an infinite 
series of orthonormal functions. 


8.24. Use a Fourier transform in three dimensions to find a solution of the Poisson 
equation: V 2 $(r) = —4jrp(r). 

8.25. For (p(x) = 8(x — x f ), find <p{y). 

8.26. Show that f(t) = 

8.27. The Fourier transform of a distribution (p is given by 


oo 


0(0 = ~ n )- 


«=0 


What is (p(xYf Hint: Use <p{x) = (p{—x) 

8.28. For f{x) = akx k , show that 


_ n 

/(») = y/2n^2^ ka k^ k \ u )i where 


Ak 

5 (%) E S(u), 




224 8. FOURIER ANALYSIS 


Additional Reading 

1. Courant, R. and Hilbert, D. Methods of Mathematical Physics ， vol. 1, In¬ 
terscience, 1962. The classic book by two masters. This is a very readable 
book written specifically for physicists. Its treatment of Fourier series and 
transforms is very clear. 

2. DeVries, P. A First Course in Computational Physics, Wiley, 1994. A good 
discussion of the fast Fourier transform including some illustrative computer 
programs. 

3. Reed, M.，and Simon, B. Fourier Analysis, Self-Adjointness, Academic 
Press, 1980. Second volume of a four-volume series, A comprehensive ex¬ 
position of Fourier analysis with emphasis on operator theory. 

4. Richtmyer, R. Principles of Advanced Mathematical Physics, Springer- 
Verlag, 1978. A two-volume book on mathematical physics written in a 
formal style, but very useful due to its comprehensiveness and the large 
number of examples drawn from physics. Chapter 4 discusses Fourier anal¬ 
ysis and distributions. 


Part III 


Complex Analysis 




9 _ 

Complex Calculus 


Complex analysis, just like real analysis, deals with questions of continuity, con¬ 
vergence of series, differentiation, integration, and so forth. The reader is assumed 
to have been exposed to the algebra of complex numbers. 


9.1 Complex Functions 

A complex function is a map / : C — C ， and we write f(z) = w, where both 
z and w are complex numbers. 1 The map / can be geometrically thought of as a 
correspondence between two complex planes, the z-plane and the w-plane. Theu;- 
plane has a real axis and an imaginary axis, which we can call m and u ， respectively. 
Both w and v are real functions of the coordinates of z 7 i.e., x and y. Therefore, we 
may write 

f(z) - u(x, y) + 吨 ，; y). (9.1) 

This equation gives a unique point (w, v) in the u;-plane for each point (x，>0 
in the z-plane (see Figure 9.1). Under /, regions of the z-plane are mapped onto 
regions of the it?-plane. For instance, a curve in the z-plane maybe mapped into a 
curve in the u;-plane. The following example illustrates this point. 


9*1.1. Example. Let us investigate the behavior of a couple of elementary complex func¬ 
tions. In particular, we shall look at the way a line y = mx in the z-plane is mapped into 
curves in the uj- plane. 


1 Strictly speaking, we should write f : S ^ C where 5 is a subset of the complex plane. The reason is that most functions 
are not defined for the entire set of complex numbers, so that the domain of such functions is not necessarily C. We shall specify 
the domain only when it is absolutely necessary. Otherwise, we use the generic notation / : C C, even though / is defined 
only on a subset of C. 


228 9. COMPLEX CALCULUS 


f 



Figure 9.1 A map from the z-plane to the w-plane. 

(a) For it; = f{z) = Z 2 , we have 
uj = (jc + iy) 2 = x 2 — y 2 + 2ixy t 

with u{x, y) = x 2 — y 2 and v(x ， y) = 2xy. For y =： mx, i.e., for a line in the z-plane 
with slope m, these equations yield m = (1 — m 2 )x 2 and u = 2mx 2 . Eliminating x in these 
equations, we find v = [2m/(l — m 2 )]u. This is a line passing through the origin of the 
uj- plane [see Figure 9.2(a)]. Note that the angle the image line makes with the real axis 
of the uj-plane is twice the angle the original line makes with the jc-axis. (Show this!), (b) 
The function w = f{z) = e z = e x ^~ iy gives u(x t y) = e x cos>» and v(x, y) = e x siny. 
Substituting y = mx^ we obtain u = e x cos mx and v = ^ Unlike part (a), we 

cannot eliminate x to find i; as an explicit fiinction of w. Nevertheless, the last pair of 
equations are parametric equations of a curve, which we can plot in a ww-plane as shown 
in Figure 9.2(b). 围 

Limits of complex functions are defined in terms of absolute values. Thus, 
/ (z) = wo means that given any real number € > 0, we can find a 
corresponding real number 5 > 0 such that \f(z) -wo\ < € whenever \z-a\ < S. 
Similarly, we say that a function / is continuous at 2 = a iflim^ a f(z) : =/(a). 

9.2 Analytic Functions 

The derivative of a complex function is defined as usual: 

9.2.1. Defuution. Let f : C — C be a complex function. The derivative of f at 
zo is 

4 / = Um /fa> + Az)- /(zq) 

dz zo 0 Az 


provided that the limit exists and is independent of Az^ 



9.2 ANALYTIC FUNCTIONS 229 





Figure 9_2 (a) The map z 2 takes a line with slope angle of and maps it to a line with twice 

the angle in the w-plane. (b) The map e z takes the same line and maps it to a spiral in the 
w-plane. 


Example illustrating 
path dependence of 
derivative 


In this definition “independent of △〆’ means independent of Ax and Ay (the 
components of Az) and, therefore, independent of the direction of approach to zo. 
The restrictions of this definition apply to the real case as well. For instance, the 
derivative of /(x) = |x| atjc = 0 does not exist 2 because it approaches +1 from 
the right and —1 from the left. 

It can easily be shown that all the formal rules of differentiation that apply to the 
real case also apply to the complex case. For example, if f and g are differentiable, 
then f ± g 9 fg, and — as long as g is not zero — f/g are also differentiable, and 
their derivatives are given by the usual rules of differentiation. 

9.2.2* Example. Let us examine the derivative of f(z) d 2 + 2iy 2 atz = 1+i ： 


fz 


Z=l-\-i 


lim 

Az->0 

lim 
Ajc->0 
△戶 0 


lim 

Aa :^0 

A>->0 


/(l+i + Az)-/(l + Q 
Az 

(1 + A;Q 2 + 2/(1 + Ay) 2 -1-2/ 
Ax + iAy 

2Ax + 4/Ay + (Ax) 2 + 2i(Ay) 2 


Ax + i Ay 


Let us approach z = 1 + i along the line y 
limit yields 


m{x - 1). Then △: y = mAx t and the 


l 


2=1+1 


lim 


2Ax + 4imAx + (Ax) 2 + 2im 2 (Ax) 2 2 + Aim 


+ imAx 


+ im 


2 One can rephrase this and say that the derivative exists, but not in terms of ordinary functions, rather, in terms of generalized 
functions—in this case 沒 (x) — discussed in Chapter 6. 



230 9. COMPLEX CALCULUS 


Cauchy-Riemann 

conditions 


It follows that we get infinitely many values for the derivative depending on the value we 
assign to m y i.e., depending on the direction along which we approach 1 + Thus, the 
derivative does not exist at z = 1 -f z. ■ 

It is clear from the definition that differentiability puts a severe restriction on 
f(z) because it requires the limit to be the same for all paths going through zo- 
Furthermore, differentiability is a local property: To test whether or not a function 
f{z) is differentiable at zq 9 we move away from 之 o by a small amount Az and 
check the existence of the limit in Definition 9.2.1. 

What are the conditions under which a complex function is differentiable? For 
f(z) = u{x, y) + iv(x, y) 9 Definition 9.2.1 yields 


d± 

dz 


zo 


lim 

Aj— o 
Ay-^Q 


\ M(J 0 + Aj:, yo + △}) — yo) 


0 l H- i Ay 

v(x 0 -^ Ax, yo + Ay) - v(xo t y Q ) 


Ax H- iAy J ' 

If this limit is to exist for all paths，it must exist for the two particular paths on 
which Ay = 0 (parallel to the jc-axis) and Aa ： = 0 (parallel to the 夕 -axis). For the 
first path we get 


dz 


ZO 


lim 


u(xo 4 - A^:, yo) - u(xo, yo) 


Ax 


+ i lim 
Ajc^-0 


iKxo + Ax, yo) - v(xo,yo) 
Ax 


Bu 

dx 




For the second path (Ax — 0), we obtain 

u(xo,yo + A^) - u(x 0 , yo) 


d± 

dz 


zo 


=lim 
Ay—0 

+ i lim 
Aj-»0 


iAy 

v(xp 9 yo + Ay) - v(xp y y Q ) 
iAy 


3u 


d y ^(^o,yo) + d y 


Uo»yo) 


If / is to be differentiable at zo, the derivatives along the two paths must be equal. 
Equating the real and imaginary parts of both sides of this equation and ignoring 
the subscript zo (-^o, yo, or is arbitrary), we obtain 


Bu 

dx 


(9.2) 


dv Bu 3v 

— and — = - • 

dy dx 

These two conditions, which are necessary for the differentiability of/, are called 

the Cauchy-Riemann conditions. 

An alternative way of writing the Cauchy-Riemann (C-R) conditions is ob- 
tained by making the substitution 3 x-\{z + z*) and ^ = ^( z - z*) in u(x,y) 


^We use z* to indicate the complex conjugate of z. Occasionally we may use z. 




9.2 ANALYTIC FUNCTIONS 231 


and u(x, y) 9 using the chain hale to write Equation (9.2) in terms of z and z*, 

substituting the results in = ^： + i ^ and showing that Equation (9.2) is 

oz* oz* OZ , 

equivalent to the single equation df/dz* = 0. This equation says that 


9.2,3, Box. Iff is to be differentiable, it must be independent ofz*. 


If the derivative of / exists, the arguments leading to Equation (9.2) imply that 

Expression for the the derivative can be expressed as 
derivative of a 

differentiable df _ du t 3v _ dv .Su (9 3 ) 

complex function ^ = dx^ l ~^c = Sy~ l dy m • 


(9.3) 


The C-R conditions assure us that these two equations are equivalent. 

The following example illustrates the differentiability of complex functions. 

9.2A Example. Let us determine whether or not the following functions are differen¬ 
tiable: 

(a) We have already established that /(z) = x 2 + 2iy 2 is not differentiable atz = 1 + i. 
We can now show that it is has no derivative at any point in the complex plane (ex¬ 
cept at the origin). This is easily seen by noting that m = 义 2 and v = 2y 2 , and that 
du/dx =2x ^ dv/dy = Ay, and the first Cauchy-Riemann condition is not satisfied. The 
second C-R condition is satisfied, but that is not enough. 

We can also write f(z) in terms of z and z*: 

2 I 2 

/( Z ) = [ia-hz*)] +2i [去 fe — z*)] = I (1 - 20(Z 2 + Z* 2 ) + 1(1+ 2i)zz *， 

f (z) has an explicit dependence on z*. Therefore, it is not differentiable. 

(b) Now consider f(z) = for which u = and u = 2xy. The C-R 

conditions become du/dx = 2x = dv/dy and dujby = —2y = —Sv/dx. Thus, / (z) may 
be differentiable. Recall that the C-R conditions are only necessary conditions; we have not 
shown (but we will, shortly) that they are also sufficient. 

To check the dependence of / onz*, substitute x = (z^r z*)/2 and y = (z — z*)/(2i) 
in u and v to show that /(z) = z 2 , and thus there is no z* dependence. 

(c) Let u(x t y) = e x cos y and v(x, y) = e x smy. Then du/dx — e x cosy = dv/dy and 
Bu/dy = -e x sin}? = —dv/dx, and the C-R conditions are satisfied. Also, 

f(z )= 〆 cosy sin) = e x (cosy + ismy) = e x e iy = e x ^ iy = e z , 

and there is no z* dependence. 圈 

,The requirement of differentiability is very restrictive: The derivative must 
exist along infinitely many paths. On the other hand, the C-R conditions seem 
deceptively mild: They are derived for only two paths. Nevertheless, the two paths 
are, in fact, true representatives of all paths; that is, the C-R conditions are not only 
necessary, but also sufficient: 



232 9. COMPLEX CALCULUS 


9.2.S. Theorem. The function f{z) = u{x, y) + iv(x, y) is differentiable in a 
region of the complex plane if and only if the Cauchy-Riemann conditions, 

.du dv du dv 

— = — and — = - 

dx By dy dx 

(or, equivalently, Bf/dz* = 0 ) f are satisfied and all first partial derivatives of u 
and v are continuous in that region. In that case 

df du . 3v dv , du 
~ = -- h i — — = — — i — • 

dz dx dx dy dy 


Proof. We have already shown the “only if” part. To show the “if” part，note that 
if the derivative exists at all, it must equal (9.3). Thus, we have to show that 

/(z + Az) - f(z) Bu ,dv 

△z —0 Az dx dx 


or, equivalently, that 

Az) - f(z) /du ,dv\ 
A~z \dx^ l dx) 


< € 


whenever \Az\ < 5. 


By definition, 


f(z + Az) - f(z) = u{x + Ax, 3 ; + Ay) + iv{x + Ax,y-\- Ay) - u(x,y) - iv(x,y). 
Since u and v have continuous first partial derivatives, we can write 

u(x + Ax,y-\- Ay) =u(x i y)^^-Ax + ^ Ay + Ax + 5i Av, 

ox ay 

&JJ 91? 

+ Ax,y-h Ay) = v(x,y) + — Aa: + — + 6 2 Ax + 8 2 Ay, 

dx ay 


where ^,^ 2 , and 知 are real numbers that approach zero as and △: y approach 

zero. Using these expressions, we can write 


作 + Az) - Hz) 七 + 【 X, + 


Ay 


办 By 
+ (ei + i€2)Ax + (<5j H- i&2)^y 

(㈣ 普 ) ( Ax + z » △义 +6 △兄 


where e = e\ -\-ie 2 , S = <5i + ( 82 , and we used the C-R conditions in the last step. 
Dividing both sides by = A^: + i Ay, we get 

/(z + A^) — / (z) (bu ■ .9u 、 Ax Ay 

TT \dI + l ^) = € T z +8 ^ 



9.2 ANALYTIC FUNCTIONS 233 


analyticity and 
singularity; regular 
and singular points; 
entire functions 


By the triangle inequality, \RHS\ < \e\ + i^2\ + l^i H- /知|. This follows from 
the fact that 1 △: c|/| Az| and |A^|/|Az| ai'e both equal to at most 1. The € and 5 
terms can be made as small as desired by making Az small enough. We have thus 
established that when the C-R conditions hold, the function / is differentiable. □ 


Augustin-Louis Cauchy (1789-1857) was one of the most influ¬ 
ential French mathematicians of the nineteenth century. He began 
his career as a military engineer, but when his health broke down 
in 1813 he followed his natural inclination and devoted himself 
wholly to mathematics. 

In mathematical productivity Cauchy was surpassed only by 
Euler，and his collected works fill 27 fat volumes. He made sub¬ 
stantial contributions to number theory and determinants; is con¬ 
sidered to be the originator of the theory of Unite groups; and did 
extensive work in astronomy, mechanics, optics, and the theory of 
elasticity. 

His greatest achievements, however, lay in the field of analysis. Together with his con¬ 
temporaries Gauss and Abel, he was a pioneer in the rigorous treatment of limits, continuous 
functions, derivatives, integrals, and infinite series. Several of the basic tests for the conver¬ 
gence of series are associated with his name. He also provided the first existence proof for 
solutions of differential equations，gave the first proof of the convergence of a Taylor series, 
and was the first to feel the need for a careful study of the convergence behavior of Fourier 
series (see Chapter 8). However，his most important work was in the theory of functions 
of a complex variable, which in essence he created and which has continued to be one of 
the dominant branches of both pure and applied mathematics. In this field, Cauchy’s inte¬ 
gral theorem and Cauchy’s integral formula are fundamental tools without which modem 
analysis could hardly exist (see Chapter 9). 

Unfortunately, his personality did not harmonize with the fruitful power of his mind. 
He was an arrogant royalist in politics and a self-righteous, preaching, pious believer in 
religion—all this in an age of republican skepticism—and most of his fellow scientists 
disliked him and considered him a smug hypocrite. It might be fairer to put first things 
first and describe him as a great mathematician who happened also to be a sincere but 
narrow-minded bigot. 


9.2.6. Definition. A function f •• C — C is called analytic at zo if it is differen¬ 
tiable at zo and at all other points in some neighborhood ofzo. A point at which 
f is analytic is called a regular point of f.A point at which f is not analytic is 
called a singular point or a singularity of f. A function for which all points in C 
are regular is called an entire function. 

9.2.7. Example, derivatives of some functions 
⑻/⑵ = z . 

Hem u = x and v = y; the C-R conditions are easily shown to hold, and for any z, we have 
df/dz = 9m /dx-\- idv/dx — 1. Therefore, the derivative exists at all points of the complex 






234 9. COMPLEX CALCULUS 


plane, (b) f(z) = z 2 . 

Here u = x 2 — y 2 and v = 2xy; the C-R conditions hold, and for all points z of the 
complex plane, we have df/dz = du/8x + idv/dx = 2a: + i2y = 2z. Therefore, f(z) is 
differentiable at all points, (c) f(z) = z n for n > 1. 

We can use mathematical induction and the fact that the product of two entire functions is an 
entire function to show that 去 (z n ) = nz n ~ l . (d) f(z) = a。+ 叩 + •. - + a n ^iz n ~ l + 

^nZ n , 

where a\ are arbitrary constants. That f(z) is entire follows directly from part (c) and the 
fact that the sum of two entire functions is entire, (e) /(z) = 1/z. The derivative can be 

found to be f\z) = — 1/z 2 , which does not exist for z = 0. Thus, z = 0 is a singularity 
of /(z). However, any other point is a regular point of f. (f) f(z) = \z\ 2 . Using the 
definition of the derivative, we obtain 

△/ k + A 2 [ 2 -[z | 2 (z + Az)(z* + Az*) —a* * . , A? 

TT = - 7 - = - T - +Az*+z —. 

Az Az Az 

For z = 0, Af/Az = Az *， which goes to zero as Az -> 0. Therefore, df/dz = 0 at 
z = 0. 4 However, if z 一 0, the limit of Af/Az will depend on how z is approached. Thus, 
df/dz does not exist if z _ 0. This shows that \z\ 2 is differentiable only at z = 0 and 
nowhere else in its neighborhood. It also shows that even if the real (here, u : =x 2 + y 2 ) 
and imaginary (here, v = 0) parts of a complex function have continuous partial derivatives 
of all orders at a point, the function may not be differentiable there, (g) /(z) = 1/ sin z: 

This gives df/dz =： -cosz/ sin 2 z. Thus, / has infinitely many (isolated) singular points 
atz = ±«7T forn = 0,1 ， 2,_ B 


9.2*8. Example. The complex exponential function 

In this example, we find the (unique) function f C C that has the following three 
properties: 


(a) / is single-valued and analytic for ail z, 

(b) df/dz - f(z), and 

⑹ f(z\ +Z 2 ) = /fei)/fe)- 


Property (b) shows that if/(z) is well behaved, then df/dzis also well behaved. In particular, 
if/te)is defined for all values of z, then / must be entire. 

For zi == 0 = Z 2 , property (c) yields /(0) = [/(0)] 2 / ⑼ = 1， or /(0) 二 0, On 


the other hand, 

Inn 往仏)-彻 

dz Az—0 Az 


=lim 
Az^O 


f(z)f(Az) - f(z) 
Az 


= /⑵ 


lim 

Az->0 


/(Az) - 1 


Property (b) now implies that 


lim 


f(Az) - 1 
Az 


=1 泠 f\0) = 1 


and /(0) = 1. 


4 Although the derivative of \z | 2 exists atz = 0, it is not analytic there (or anywhere else). To be analytic at a point, a fimction 
must have derivatives at all points in some neighborhood of the given point. 




9.2 ANALYTIC FUNCTIONS 235 


The first implication follows from the definition of derivative, and the second from the fact 
that the only other choice, namely /(0) = 0, would yield —oo for the limit. 

Now, we write /(z) = u(x 9 y)- {- y),for which property (b) becomes 

du . dv . du dv 

dx dx dx dx 

These equations have the most general solution u(x ： y) = aiy)^ and v(x } y) = b(y)e x , 
where a(y) and b(y) are the “constants” of integration. The Cauchy-Riemann conditions 
now yield a(y) = db/dy and da/dy = -b(y) f whose most general solution is a(y)= 
A cos y-\-B siny,^(>?) = A sin y — 5 cos y. On the other hand, / ⑼： =：1 yields w(0,0) = 1 
^ndu(0 ， 0) 0, implying that a(0) = 1,6(0) = OorA = 1, B = 0. We therefore conclude 

that 

f(z) = a(y)e x + ib(y)e x = =^(cos y + i sin y) = e x e iy = e z . 

Both ^ and e iy ai'e well-defined in the entire complex plane. Hence, e z is defined and 
differentiable over all C; therefore, it is entire. ■ 


Example 9.2.7 shows that any polynomial in z is entire. Example 9.2.8 shows 
that the exponential function e z is also entire. Therefore, any product and/or sum 
of polynomials and e z will also be entire. We can build other entire functions. For 
instance, e iz and e~ iz are entire functions; therefore, the trigonometric functions, 
defined as 

Jz _ e -iz Jz 1 e -iz • 

sinz = ————— and cosz = - - - ， (9.4) 

JA l 

are also entire functions. Problem 9.5 shows that sinz and cosz have only real 
zeros. The hyperbolic functions can be defined similarly: 


sinhz = - and cosh z = --- - (9.5) 

2 2 

Although the sum and the product of entire functions are entire, the ratio, in 
general, is not. For instance, if f{z) and g(z) are polynomials of degrees m and 
n, respectively, then for n > 0, the ratio f(z)/g(z) is not entire, because at the 
zeros of g(z ) — which always exist and we assume that it is not a zero of/(z) — the 
derivative is not defined. 

The functions u(x, and v(x, y) of an analytic function have an interesting 
property that the following example investigates. 


9.2.9. Example* The family of curves u(x, y) = constant is perpendicular to the family 
of curves v(x, y) = constant at each point of the complex plane where f(z) = u-\-iv is 
analytic. 

This can easily be seen by looking at the normal to the curves. The normal to the curve 
u (x, y) = constant is simply Vm = (du/3x, du/dy). Similarly, the normal to the curve 
v(x, y) = constant is Vu = (dv/3x, dv/dy). Taking the dot product of these two nomials, 
we obtain 


(▽«). (Vv) 


du 3v du dv du 

dx dx + By dy dx 
by the C-R conditions. 


/ 9«\ du /du 

V d\) ^ 


dy/ By 


(S- 


_ 




236 9. COMPLEX CALCULUS 


9.3 Conformal Maps 

The real and imaginary parts of an analytic function separately satisfy the two- 
dimensional Laplace’s equation: 

d 2 u d 2 u 

fe 2 + dy 2 

This can easily be verified from the C-R conditions. Laplace’s equation in three 
dimensions, 

a 2 4> d 2 <P d 2 0 

~2 TTY - ^ = 0 , 

3x 2 dy 2 Sz 2 

describes the electrostatic potential <I> in a charge-free region of space. In a typ¬ 
ical electrostatic problem the potential is given at certain boundaries (usually 
conducting surfaces), and its value at every point in space is sought. There are 
numerous techniques for solving such problems, and some of them will be dis¬ 
cussed later in the book. However, some of these problems have a certain degree 
of symmetry that reduces them to two-dimensional problems. In such cases, the 
theory of analytic functions can be extremely helpful. 

The symmetry mentioned above is cylindrical symmetry, where the potential is 
known a priori to be independent of the z-coordinate (the axis of symmetry). This 
situation occurs when conductors are cylinders and — if there are charge distribu¬ 
tions in certain regions of space — the densities are z-independent. In such cases, 
B^f/dz — 0, and the problem reduces to a two-dimensional one. 
harmonic functions Functions satisfying Laplace’s equation arc called harmonic functions. Thus, 

the electrostatic potential is a three-dimensional harmonic function, and the po, 
tential for a cylindrically symmetric charge distribution and boundary condition is 
a two-dimensional harmonic function. Since the real and the imaginary parts of a 
complex analytic function are also harmonic, techniques of complex analysis are 
sometimes useful in solving electrostatic problems with cylindrical symmetry. 5 

To illustrate the connection between electrostatics and complex analysis, con¬ 
sider a long straight filament with a constant linear charge density 入 • It is shown in 
introductory electromagnetism that the potential ^ (disregarding the arbitrary con¬ 
stant that determines the reference potential) is given, in cylindrical coordinates, 
by = 2Xlnp = = 2A.In [z|. Since O satisfies Laplace’s 

equation, we conclude that <3> could be the real part of an analytic function w(z )， 
complex potential which we call the complex potential. Example 9,2.9, plus the fact that the curves 
u = 0 = constant arc circles, imply that the constant-i; curves are rays, i.e” 
v oc<p. Choosing the constant of proportionality as 2A., we obtain 

w(Z) = 2 入 Inp + i2X(p = 2 义 ln(p^). = 


We use electrostatics because it is more familiar to physics students. Engineering students are familiar with steady state heat 
transfer as well, which also involves Laplace's equation, and therefore is amenable to this technique. 


d 2 V 


dx 


2 + a? 


o 


(9.6) 



9.3 CONFORMAL MAPS 237 


conformal mapping 


It is useful to know the complex potential of more than one filament of charge. 
To find such a potential we must first find w(z) fora line charge when it is displaced 
from the origin. If the line is located at zo == ^0 + iyo, then it is easy to show that 
w(z) = 2X ]n(z - there are n line charges located atz\,Z 2 . z n , then 

n 

w(z) = \n(z - Zk)- (9.7) 

k=l 

The function w(z) can be used directly to solve a number of electrostatic problems 
involving simple charge distributions and conductor arrangements. Some of these 
are illustrated in problems at the end of this chapter. Instead of treating w(z) as a 
complex potential，let us look at it as a map from the z»plane (or x’plane) to the 
u;-plane (or wu-plane). In particular, the equipotential curves (circles) are mapped 
onto lines parallel to the v-axis in the w-plane. This is so because equipotential 
curves are defined by u — constant. Similarly, the constant-u curves are mapped 
onto horizontal lines in the u;-plane. 

This is an enormous simplification of the geometry. Straight lines, especially 
when they are parallel to axes, are by far simpler geometrical objects than circles, 6 
especially if the circles are not centered at the origin. So let us consider two 
complex “worlds.” One is represented by the xj-plane and denoted by z. The 
other, the “prime world,，’ is represented 7 by z f , and its real and imaginary parts 
by 文 ’ and y. We start in z, where we need to find a physical quantity such as the 
electrostatic potential y). If the problem is too complicated in the 之 -world ， 
we transfer it to the z^world, in which it may be easily solvable; we solve the 
problem there (in terms of x f and y f ) and then transfer back to the z-world (x and 
y). The mapping that relates z and 〆 must be cleverly chosen. Otherwise，there is 
no guarantee that the problem will simplify. 

Two conditions are necessary for the above strategy to work. First, the dif¬ 
ferential equation describing the physics must not get more complicated with the 
transfer toSince Laplace’s equation is already of the simplest type, the z'-world 
must also respect Laplace’s equation. Second, and more importantly, the map¬ 
ping must preserve the angles between curves. This is necessary because we want 
the equipotential curves and the field lines to be perpendicular in both worlds. A 
mapping that preserves the angle between two curves at a given point is called 
a conformal mapping. We already have such mappings at our disposal, as the 
following proposition shows. 

9.3.1, Proposition. Lety\ and yi be curves in the complex z-plane that intersect at 
a point zo at an angle a.Letf:C^ C be a mapping given by f (z) = 〆 = x f -\-iy f 
that is analytic at Let y[ and be the images ofy\ andyi under this mapping ， 
which intersect at an angle a! • Then, 


6This statement is valid only in Cartesian coordinates. But these are precisely the coordinates we are using in this discussion. 
7 We are using z! instead of w, and {x\ y f ) instead of {u, v). 



238 9. COMPLEX CALCULUS 


(a) a f = a, that is, the mapping f is conformal, if(dz f /dz) Z0 ^ 0. 

(b) Iff is harmonic in (x f y), it is also harmonic in (x\ y l ). 

Proof. We sketch the proof of the first part. The details, as well as the proof of the 
second part, involve partial differentiation and the chain rule and are left for the 
reader. The angle between the two curves is obtained by taking the inner product 
of the two unit vectors tangent to the curves at zo. A small displacement along y/ 
can be written as + e-yA^- fori = 1,2, and the unit vectors as 


-\-e y Ayi 



for i = 1,2. 


Therefore, 


Ax\Ax 2 + AjiA^ 2 



Similarly, in the prime plane, we have 


translation 

dilation 


inversion 




Ax[ A%2 + Ay f 2 

&i) 2 + (Ay[)^J(Ax^ + (A^)2 


where x f = u(x, y) and / = v(x 9 y) and m and v are the real and imaginary parts 
of the analytic function /. Using the relations 


du A du 

[△心 + ^y^ yi> 


..dv dv 


1 , 2 , 


and the Cauchy-Riemarm conditions, the reader may verify that S; □ 


The following are some examples of conformal mappings. 

(a) z r = z +a, where 0 is an arbitrary complex constant. This is simply a trans¬ 
lation of the z-plane. 

(b) z f = bz, where 办 is an arbitrary complex constant. This is a dilation whereby 
distances are dilated by a factor |*|. A graph in the z-plane is mapped onto a sim¬ 
ilar (congruent) graph in the z^plane that will be reduced (|纠 < 1) or enlarged 
(\b\ > 1) by a factor of [&|. 

⑹ 〆 =1/z. This is called an inversion. Example 9.3.2 will show that under such 
a mapping, circles are mapped onto circles or straight lines. 

(d) Combining the preceding three transformations yields the general mapping 


az + b 
cz + d 


(9.8) 


which is conformal if cz + d ^ 0 dz f /dz^ The latter conditions are equivalent 

to ad — be ^ 0. 



9.3 CONFORMAL MAPS 239 


homographic 

transformations 


9.3.2. Example. A circle of radius r whose center is at a in the z-plane is described by 
the equation \z — a\ = r. When transforming to the z’-plane under inversion, this equation 
becomes |1/^ — a\ = r, or |1 — az f \ = rlzY Squaring both sides and simplifying yields 
(r 2 — |fl| 2 )|z’| 2 + 2Re(a〆）—1 = 0. In terms of Cartesian coordinates, this becomes 

(r 2 - \ a \ 2 ){x a + y a ) ^ 2{a r x f - a^) -1=0 ， (9.9) 

where a = a r + iar We now consider two cases: 

1. r ^ \a\ : Divide by r 2 — \a\ 2 and complete the squares to get 

/ f a r / / a i + a f 1 

V r 2 — |a| 2 / v r 2 — \a\ 2 ^ ( r 2 —㈣ 2) 2 r 2 - |fl| 2 ’ 

or defining a f r = -a r /{r 2 - \a\ 2 ),a[ ^ fli/(r 2 — \a\ 2 ), mdr f = r/\r 2 - |a| 2 |, we 
have (x f — a f r ) 2 + (y r — a^) 2 = r 72 , which can also be written as 

I〆-A = 〆 ， a ! = a f r -\ - ia[ - r 2 . 

This is a circle in the -plane with center at a ! and radius of r f . 

2. r = a: Then Equation (9.9) reduces to a r x f — 叫 / = 士 ， which is the equation of a 
line. 

If we use the transformation 〆 =l/(z — c) instead of z f = 1/z, then \z — a\ = r 
becomes \\jz f — (a — c)| = r, and all the above analysis will go through exactly as before, 
except that a is replaced by a —c. 圔 

Mappings of the form given in Equation (9.8) are called homographic trans¬ 
formations. A useful property of such transformations is that they can map an 
infinite region of the z-plane onto a finite region of the z’"plane. In fact, points 
with very large values of z are mapped onto a neighborhood of the point z / = a/c. 
Of course, this argument goes both ways: Equation (9.8) also maps a neighbor¬ 
hood of —d/c in the z-plane onto large regions of the z' ， plane. The usefulness of 
homographic transformations is illustrated in the following example. 

9,3.3* Example. Consider two cylindrical conductors of equal radius r, held at potentials 
u\ and respectively, whose centers are D units of length apart. Choose the 文 -and the 
3 ?-axes such that the centers of the cylinders are located on the x-axis at distances a\ and 
a 2 from the origin, as shown in Figure 9.3. Let us find the electrostatic potential produced 
by such a configuration in the ory-plane. 

We know from elementary electrostatics that the problem becomes very simple if the 
two cylinders are concentric (and, of course, of different radii). Thus, we try to map the 
two circles onto two concentric circles in the z^plane such that the infinite region outside 
the two circles in the z-plane gets mapped onto the finite annular region between the two 
concentric circles in the z'-plane. We then (easily) find the potential in the z’-plane，and 
transfer it back to the z-plane. 

The most general mapping that may be able to do the job is that given by Equation 
(9.8). However, it turns out that we do not have to be this general. In fact, the special case 



24Q 9. COMPLEX CALCULUS 



8 Writing z 
value. 



z f = l/(z — c) in which c is a real constant will be sufficient. So, z = (1/〆 ） + c，and the 
circles \z—a^\ = r for/: = 1,2 will be mapped onto the circles 1^ — a f k \=. r f k , where (by 
Example 9.3.2) a f k = (% - c)/[(ak - c) 2 - r 2 ] and r f k = r/](% - c) 2 ~r 2 |. 

Can we arrange the parameters so that the circles in the z 7 -plane are concentric, i.e., 
that a J = a f ^t The answer is yes. WesetaJ =a^ and solve for 02 terms ofaj. The result 
is either the trivial solution a2 = a \ 9 or a2 = c — r 2 /{a\ — c). If we place the origin of 
the 2 -plane at the center of the first cylinder, then a\ =0 and a2 = D = c + r 2 /c. We can 
also findflj and a^. a[ = _c/(c 2 _ r 2 ), and the geometry of the problem is as 

shown in Figure 9.4. 

For such a geometry the potential at a point in the annular region is given by O'= 
Alnp + B = Ain \z f — a f \ H- B, where A and B are real constants determined by the 
conditions = u\ and ^{r!^ = «2» which yields 

and B = 

The potential is the real part of the complex function 8 
F(z f ) = A \n(z r - a f ) -H B, 

which is analytic except at 7 ! = a 1 , a point lying outside the region of interest. We can now 
go back to the z-plane by substituting 〆 = 1 /(z — c) to obtain 

G(z) = Ain ^ --- a f ^j + B, 


\z\e l& , we note that \nz = ln|z| -\-i0 9 so that the real part of a complex log function is the log of the absolute 


A 


u\ -U2 
Mr[/r f 2 ) 


U2^r f ^ — u\ lnr& 
ln(r(/r^) 



9.4 INTEGRATION OF COMPLEX FUNCTIONS 241 



Figure 9.4 In the 幺 -plane, we see two concentric unequal cylinders. 

whose real part is the potential in the z-plane: 

1 一 o! 2 H* o! c 

cj)(x, y) = Re[G(z)] = A\n - + B 

Z — c 

=A In 

A , f (1+ a f c - a’ 又 ) 2 十 a ， 2 y 2 ~\ t n 

=—In --- -7y -- + B. 

2 [ (x - c) 2 + y 2 」 

This is the potential we want. _ 


9.4 Integration of Complex Functions 

The derivative of a complex function is an important concept and, as the previous 
section demonstrated, provides a powerful tool in physical applications. The con¬ 
cept of integration is even more important. In fact, we will see in the next section 
that derivatives can be written in terms of integrals. We will study integrals of 
complex functions in detail in this section. 

The definite integral of a complex function is defined in analogy to that of a 
real function: 

«2 JV 

f(z)dz= lim 

N-^-oo T—: 

1 Az/^0 



(1 + a r c — ax) — iay 
(x -c)-\- iy 




242 9. COMPLEX CALCULUS 


curve defined 


contour defined 


Cauchy-Goursat 

theorem 


where Azi is a small segment, situated at 之 / ， of the curve that connects the complex 
number ai to the complex numbers in the z-plane. Since there are infinitely many 
ways of connecting to a 2 , it is possible to obtain different values for the integral 
for different paths. 

One encounters a similar situation when one tries to evaluate the line integral 
of a vector field. In fact, we can turn the integral of a complex function into a line 
integral as follows. We substitute f(z) =u-\-iv and dz = dx idy in the integral 
to obtain 



f(z) dz 


•Of2 


(udx — v dy) + i 


'oti 


rcti 

I (v dx -\-udy). 


If we define the two-dimensional vectors Ai = (u, —v) and A 2 = (v, u), then we 
get f{z) dz = /^ 2 Ai ■ + i A 2 - dr. It follows from Stokes’ theorem 

(or Green^ theorem, since the vectors lie in a plane) that the integral of / is path- 
independent only if both A 1 and A 2 have vanishing curls. This in turn follows if 
and only if u and v satisfy the C-R conditions, and this is exactly what is needed 
for f(z) to be analytic. 

Path-independence of a line integral of a vector A is equivalent to the vanishing 
of the integral along a closed path, and the latter is equivalent to the vanishing of 
▽ x A = 0 at every point of the region bordered by the closed path. The preceding 
discussion is encapsulated in an important theorem, which we shall state shortly. 
First, however, it is worthwhile to become familiar with some terminology used 
frequently in complex analysis. 

1. A curve is a map }/ : [a, /?] -» C from the real interval into the complex 

plane given by )/(0 = where a <t <b, and/ r and 7 / are the 

real and imaginary parts of y; y(a) is called the initial point of the curve 
and y(b) its final point. 

2. A simple arc, or a Jordan arc, is a curve that does not cross itself, i.e., y is 
injective (or one to one), so that y(t\) ^ y (&) when t\ 一 

3. A path is a finite collection {n ， 72 , . ■. ， y«} of simple arcs such that the 
initial point of yk+i coincides with the final point of 

4. A smooth arc is a curve for which dy/dt = dy r /dt 4 - idyi/dt exists and 
is nonzero for t e [a, b]. 

5. A contour is a path whose arcs are smooth. When the initial point of y\ 
coincides with the final point of y M , the contour is said to be a simple closed 
contour. 


9.4 丄 Theorem. (Cauchy-Goursat theorem) Let f : C — C be analytic on a 
simple closed contour C and at all points inside C. Then 



f(z) dz = 0. 




9.4 INTEGRATION OF COMPLEX FUNCTIONS 243 



Figure 9.5 The three different paths of integration corresponding to the integrals l\, I f v 
/ 2 , and V T 


9.4.2. Example. Examples of definite integrals 

(a) Let us evaluate the integral I\ = f zdz where Vi is the straight line drawn from the 
origin to the point (1,2) (see Figure 9.5). Along such a line y ^ 2x and, using t for jc ， 
Yl (/) = t -\-2it where 0 < f < 1 ; so 

= f zdz — f (t +■ 2it)(dt + 2idt^ = f + 4itdt) = — § H - 2?. 

Jy\ Jo Jo ~ 

For a different path w，along which y = 2x 2 , we get Y 2 (t) = f + 2it 2 where 0 < f < 1, 
and 

= f zdz = f (t + 2it 2 )(dt + Aitdt) = — | H- 2 i. 

Jyi 

Therefore, /^ = /J. This is what is expected from the Cauchy-Goursat theorem because 
the function f(z) = z is analytic on the two paths and in the region bounded by them. 

(b) To find I 2 = f yi z^dz with y\ as in part (a)，substitute for z in terms of t: 

/? = / (t + 2it)^(dt -H 2idt) = (1 +2i)^ / P"dt = ―― — —i. 

J yi JO 3 3 

Next we compare I 2 with = f y3 z 2 dz where )/3 is as shown in Figure 9.5. This path can 
be described by 


Y3(t) = 


t 

l+z_(f — 1) 


for 0 < / < 1, 
for 1 < f < 3. 


Therefore, 


1 卜 f: 


dt + m 2 (idt) = l --A-h = ~^h 

0 Jl 3 3 3 3 


244 9. COMPLEX CALCULUS 



Figure 9.6 The two semicircular paths for calculating 1 ^ and Iy 


which is identical to 办 ， once again because the function is analytic on y\ and 3/3 as well as 
in the region bounded by them. 

(c) Now consider 1^ = f dzjz where )/4 is the upper semicircle of unit radius, as shown 
in Figure 9.6. A parametric equation for can be given in terms of 0: 

= cos 沒 + 1 sin^ = e lQ dz = ie i& dO, 0 <6 <7t. 

Thus, we obtain 



On the other hand, 



*7T 


n-K e 


1 

Te 


ie l6 d0 = —ijt. 


Here the two integrals are not equal. From )/4 and we can construct a counterclockwise 
simple closed contour C, along which the integral of f(z) = 1/z becomes dz/z = 
h ~ ^3 = 2i?r. That the integral is not zero is a consequence of the fact that 1/z is not 
analytic at all points of the region bounded by the closed contour C. M 

The Cauchy-Gouisat theorem applies to more complicated regions. When a 
region contains points at which f(z) is not analytic, those points can be avoided 
by redefining the region and the contour. Such a procedure requires an agreement 
on the direction we will take. 


convention for 
positive sense of 
integration around a 
closed contour 


Convention. When integrating along a closed contour，we agree to move along 
the contour in such a way that the enclosed region lies to our left. An integration 
that follows this convention is called integration in the positive sense. Integration 
performed in the opposite direction acquires a minus sign. 




9.4 INTEGRATION OF COMPLEX FUNCTIONS 245 


simply and multiply 
connected regions 


Cauchy Integral 
Formula 




O 


x 


Figure 9.7 A complicated contour can be broken up into simpler ones. Note that the 
boundaries of the “eyes” and the “mouth” are forced to be traversed in the (negative) 
clockwise direction. 





Fora simple closed contour, movement in the counterclockwise direction yields 
integration in the positive sense. However, as the contour becomes more compli¬ 
cated, this conclusion breaks down. Figure 9.7 shows a complicated path enclosing 
a region (shaded) in which the integrand is analytic. Note that it is possible to tra¬ 
verse a portion of the region twice in opposite directions without affecting the 
integral, which may be a sum of integrals for different pieces of the contour. Also 
note that the “eyes” and the “mouth” are traversed clockwise! This is necessary 
because of the convention above. A region such as that shown in Figure 9.7, in 
which holes are “punched out,” is called multiply connected. In contrast ， 汪 sim¬ 
ply connected region is one in which every simple closed contour encloses only 
points of the region. 

One important consequence of the Cauchy-Goursat theorem is the following: 

9.4.3. Theorem. (Cauchy integral formula) Let f be analytic on and within a 
simple closed contour C integrated in the positive sense. Let zq be any interior 
point to C. Then 

f(zo) = 

2m 

To prove the Cauchy integral formula (CIF), we need the following lemma. 

9.4.4. Lemma. (Darboux inequality) Suppose f \ C Cis continuous and 
bounded on a path y ， Le” there exists a positive number M such that |/(z)| < M 


Jc Z — ZQ 


246 9. COMPLEX CALCULUS 


for all values z ey. Then 


f(z)dz 


Y 


< 


MLy ， 


where L y is the length of the path of integration. 
Proof. 


f f(z) dz 

_ 

h 



N 


lim Tf(zi)Azi 

N-^oo 


lim 

iV-^oo 

Az |-*>0 


N 




N 


i=l 

N 


< lim V|/fe)Azf| = lim Y'l/fe)! |A^| 

^ iV—00 f-T 

>0 *=1 


N—oo 
Azi—0 


N 


< M lim / = ML y . 

_ N-^oo ^ Y 

Azi^O*=l 

The first inequality follows from the triangle inequality, the second from the bound¬ 
edness of /, and the last equality follows .from the definition of the length of a 
path. □ 

Now we are ready to prove the Cauchy integral formula. 

Proof of CIF. Consider the shaded region in Figure 9.8, which is bounded by C, 
by 7o ( a circle of arbitrarily small radius 8 centered at zo), and by L\ and L 2 , two 
straight line segments infinitesimally close to one another (we can, in fact, assume 
that L\ and [2 are right on top of one another; however, they are separated in the 
figure for clarity). Let us use C to denote the union of all these curves. 

Since f(z)/(z — zo) is analytic everywhere on the contour C f and inside the 
shaded region, we can write 


0 


丄/巡 

2jti J a z-zo 
1 r/ f(z) 


dz 


(9.10) 


Ini 


dz+ (t 

-Jc Z — Zo JyQ z 


f(z) 


zo 


dz + 


i 


/⑵ 


dz-\- 


Jh\ z — zo 


{ ^ 

" l 2 Z — zq 


dz 


The contributions from L 1 and L 2 cancel because they are integrals along the same 
line segment in opposite directions. Let us evaluate the contribution from the in¬ 
finitesimal circle yo. First we note that because f(z) is continuous (differentiability 
implies continuity), we can write 

f(z) - /(zo) \f(z)-f(zo)\ \f(z) - /(zo)l ^ 

- = - = - < - 

z-zo \z -zo\ 5 8 


9.4 INTEGRATION OF COMPLEX FUNCTIONS 247 



Wl 


X 


Figure 9.8 The integrand is analytic within and on the boundary of the shaded region. It 
is always possible to construct contours that exclude all singular points. 


for z € yo, where 6 is a small positive number. We now apply the Darboux 
inequality and write 

Jyo ^ _ 之 o 

This means that the integral goes to zero as 5 — 0, or 

<f ^-dz = (f ^-dz = /(zo) 

Jyq z — zo JyQ z — zo 

We can easily calculate the integral on the RHS by noting that z — zo = 8e l(p and 
that yo has a clockwise direction: 

lit 

Substituting this in (9.10) yields the desired result. □ 

9,4.5, Example. We can use the CIF to evaluate the integrals 

,x z 2 dz T f (z 2 - i)dz 

1 : 2 = JC 2 ( 2 -|)(. 2 _ 4 ) 3 ? 



i8e l(p d<p 

Se i( P 


—Ini 


m 


dz = -27r//(zo). 




r e z ! 2 dz 
fc 3 (z - i7t)(z 2 — 20) 4 ’ 



248 9. COMPLEX CALCULUS 


Explanation of why 
the Cauchy integral 
formula works! 


where Cj, C 2 , and C 3 are circles centered at the origin with radii r\ = 3/2, T 2 = 1, and 
尸 3 = 4. 

For 1\ we note that f(z) = 2 2 /(z 2 + 3 ) 2 is analytic within and on C\^ and zo = i lies 
in the interior of C\, Thus, 


h 


f{z)dz 


2nif{i) = 2ni 


i 2 


JT 

2 


f ci 2-1 a 2 + 3) 2 

Similarly, f(z) = (z 2 — l)/(z 2 — 4 ) 3 for the integral I 2 is analytic on and within C 2 , and 
zo = 1/2 is an interior point of C 2 - Thus, the CIF gives 

f(z)dz - . . 32jt 


h 


'C 2 z 


2tt//(1/2) 


1 


1125 


1 . 


For the last integral, / (z) = e z ^ 2 /(z 2 — 20) 4 , and the interior point is zq = in : 




f(z)dz 
z — in 


= 27tif{in )= 


2tt 

(jt 2 +20) 4 



The Cauchy integral formula gives the value of an analytic function at every 
point inside a simple closed contour when it is given the value of the function 
only at points on the contour. It seems as though an analytic function is not free to 
change inside a region once its value is fixed on the contour enclosing that region. 

There is an analogous situation in electrostatics: The specification of the poten¬ 
tial at the boundaries, such as the surfaces of conductors, automatically determines 
the potential at any other point in the region of space bounded by the conductors. 
This is the content of the uniqueness theorem used in electrostatic boundary value 
problems. However, the electrostatic potential O is bound by another condition, 
Laplace’s equation; and the combination of Laplace’s equation and the boundary 
conditions furnishes the uniqueness of O. Similarly, the real and imaginary parts 
of an analytic function separately satisfy Laplace’s equation in two dimensions! 
Thus, it should come as no surprise that the value of an analytic function on a 
boundary (contour) determines the function at all points inside the boundary. 


9.5 Derivatives as Integrals 


The Cauchy Integral Formula is a very powerful tool for working with analytic 
functions. One of the applications of this formula is in evaluating the derivatives 
of such functions. It is convenient to change the dummy integration variable to § 


and write the QF as 

1 1 /( ⑽ 


f(z) 


2ni 




(9.11) 


where C is a simple closed contour in the 与 -plane and z is a point within C. 
As preparation for defining the derivative of an analytic function, we need the 
following result. 




9.5 DERIVATIVES AS INTEGRALS 249 


9.5.1* Proposition* Let y be any path — a contour，for example~and g a contin¬ 
uous function on that path. The function f(z) defined by 


f(z) = 士乂 


g (⑽ 


is analytic at every point z 

Proof. The proof follows immediately from differentiation of the integral: 

K = = f 说他土（丄 U 丄 f 座！. 

dz 2ni dz J Y 与 一 z 2ni J y ^ dz^ — 2ni J y - z) 2 

This is defined for all values of z noton y. 9 Thus, f(z) is analytic there. □ 


derivative of an 
analytic function 
given in terms of an 
integral 


We can generalize the formula above to the nth derivative, and obtain 

~d^~2^l] y (卜 z) n+1 • 

Applying this result to an analytic function expressed by Equation (9.11), we obtain 
the following important theorem. 

9.5.2. Theorem. The derivatives of all orders of an analytic function f(z) exist 
in the domain of analyticity of the function and are themselves analytic in that 
domain. The nth derivative of f(z) is given by 


f {n) (z) 


d n f ^ n\ f /( ⑽ 


(9.12) 


9.5.3. Example. Let us apply Equation (9.12) directly to some simple functions. In all 

cases, we will assume that the contour is a circle of radius r centered at z- 

(a) Let f(z) = K.a constant. Then, for « = 1 we have • 


df 1 X Kd 与 
dz 2ni Jc (M ~ z) 2 


Since $ is on the circle C centered at z, ^ - z = re ld and d% = rie l9 d0. So we have 


竺= 丄厂 

dz 2jti Jo 



(re 沾 )2 



(b) Given f(z) = z 9 its first derivative will be 


df 1 f _ 丄 f 2n {z^re i9 )ire w dO 

dz ~ 2jti z) 2 ~ 2^/ Jo (re w ) 2 







in 


(0 + 2?r) 



9 The interchange of differentiation and integration requires justification. Such an interchange can be done if the integral has 
some restrictive properties. We shall not concern ourselves with such details. In fact, one can achieve the same result by using 
the definition of derivatives and the usual properties of integrals. 




250 9. COMPLEX CALCULUS 


(c) Given f(z) = z 2 f for the first derivative Equation (9.12) yields 
df _ I f 一 J_ f 2n (z + re w ) 2 ire w d0 


dz 


2ni Jc - z) 2 2iti Jo (re w ) 2 

1 / »27T 

[Z 2 + (re i0 ) 2 + 2zre W ] (re iB )- l dO 


2?r jo 


_2 /*2n 

— I e~ ie dO + 


•27T \ 

? i6 dO + 2z / dO = 2z. 


2n \r Jo Jq Jq J 

It can be shown that, in general, {d/dz)z m = The proof is left as Problem 

9.30. m 

The CIF is a central formula in complex analysis, and we shall see its sig¬ 
nificance in much of the later development of complex analysis. For now, let us 
demonstrate its usefulness in proving a couple of important properties of analytic 
functions. 

9.5.4. Proposition. The absolute value of an analytic function f(z) cannot have 
a local maximum within the region of analyticity of the function. 

Proof. Let 5 c C be the region of analyticity of /. Suppose zo ^ S were a local 
maximum. Then we could find a circle yo of small enough radius 8, centered at zo, 
such that |/(^o)[ > |/(^)| for all z on yo- We now show that this cannot happen. 
Using the CIF, and noting that z — zo = Se ld 9 we have 


\Hzo)\ 


1 £ 气 

l 

广 i8e w d0 

2 JT ^ j yQ Z ~ Z0 

2jt 

Jo Se^ 


■ 27 T 


'27T 


< 


2tt 


\f(z)\d0 < 


r 0 27 T jq 

where M is the maximum value of \f(z)\ for z € yo- This inequality says that 
there is at least one poitit z on the circle yo (the point at which the maximum of 
\f(z)\ is attained) such that |/(^o)l < |/(z)|. This contradicts our assumption. 
Therefore, there can be no local maximum within S. □ 

9.5.5. Proposition. A bounded entire function is necessarily a constant. 

Proof. We show that the derivative of such a function is zero. Consider 

df ^ 1 / / (⑽ 
dz ™ 2m fc - z) 2 ' 

Since /is an entire function, the closed contour C can be chosen to be a very large 
circle of radius R with center at z. Taking the absolute value of both sides yields 


MdO = M, 


d£ 

1 

广 f ⑸ iRe w d0 

dz 

lit 

Jo (Re ie ) 2 



R 





9.5 DERIVATIVES AS INTEGRALS 251 


fundamental theorem 
of algebra proved 


where M is the maximum of the function in the complex plane. Now, as i? — 00 , 
the derivative goes to zero, and the function must be a constant. □ 


Proposition 9.5.5 is a very powerful statement about analytic functions. There 

are many interesting and nontrivial real functions that are bounded and have deriva- 

2 、 

tives of all orders on the entire real line. For instance, e— x is such a function. No 
such freedom exists for complex analytic functions. Any nontrivial analytic func¬ 
tion is either not bounded (goes to infinity somewhere on the complex plane) or 
not entire (it is not analytic at some point(s) of the complex plane). 

A consequence of Proposition 9.5.5 is the fundamental theorem of algebra, 
which states that any polynomial of degree n > 1 has n roots (some of which may 
be repeated). In other words, the polynomial 

p{x) = 卻 + a\x H - h a n x n for n> l 

can be factored completely as p(x) = c(x —z\)(x — Z 2 ) mmm (x — z n ) where c is a 
constant and the zi are, in general, complex numbers. 

To see how Proposition 9.5.5 implies the fundamental theorem of algebra, we 
let f(z) = l/p(z) and assume the contrary, i.e.，that p(z) is never zero for any 
(finite) z 6 C. Then f(z) is bounded and analytic for all z € C, and Proposition 
9.5.5 says that f(z) is a constant. This is obviously wrong. Thus, there must be at 
least one z, say z = z\, for which p(z) is zero. So, we can factor out (z — zi) from 
p(z) and write p{z) = (z — z\)q{z) where q{z) is of degree n — 1. Applying the 
above argument to 孕 (z)，we have p{z) = (z — z\)(z — Z 2 )r( 之 ） where r(^) is of 
degree n — 2. Continuing in this way, we can factor p{z) into linear factors. The 
last polynomial will be a constant (a polynomial of degree zero) which we have 
denoted by c. 

The primitive (indefinite integral) of an analytic function can be defined using 
definite integrals just as in the real case. Let / : C ^ C be analytic in a region 
S of the complex plane. Let zo and z be two points in S, and define 10 F(z )= 


We can show that F(z) is the primitive of f(z) by showing that 


lim 

Az^-0 


F(z H- A^) - F(z) 


Az 


-f(z) 


0. 


We leave the details as a problem for the reader. 

9.5.6. Proposition. Let f C — C be analytic in a region S ofC. Then at e\fery 
point z 它 S，there exists an analytic function : C — C such that 


dF^ 

dz 


Hz). 


i0 Note that the integral is path-independent due to the anaiyticity of /. Thus, F is well-defined. 




252 9. COMPLEX CALCULUS 


absolute 

convergence 

power series 


circle of convergence 


In the sketch of the proof of Proposition 9.5.6, we used only the continuity of / 
and the fact that the integral was well-defined. These two conditions are sufficient 
to establish the analyticity of F and f, since the latter is the derivative of the 
former. The following theorem，due to Morera，states this fact and is the converse 
of the Cauchy-Goursat theorem. 

9.5.7. Theorem. (Morera’s theorem) Let a function f :C-^ C be continuous in 
a simply connected region S. If for each simple closed contour C in S we have 
/ c /(m = 0, then f is analytic throughout S. 


9.6 Taylor and Laurent Series 


The expansion of functions in terms of polynomials or monomials is important 
in calculus and was emphasized in the analysis of Chapter 5. We now apply this 
concept to analytic functions. 


9.6.1 Properties of Series 


The reader is assumed to have some familiarity with complex series. Nevertheless, 
we state (without proof) the most important properties of complex series before 
discussing Taylor and Laurent series. 

A complex series is said to converge absolutely if the real series \zk\ = 


yjxl + yl converges. Clearly, absolute convergence implies convergence. 


9*6.1. Proposition. If the power series a ^ z ~ 功产 converges for z\ ^ zo, 
then it converges absolutely/or evefy value of z such that \z - zol < \zi - zo\> 
Similarly if the power series J^ =0 h/(z - zo) k converges for Z2 / zo, then it 
converges absolutely /or every value of z such that \z "- zo\ > \Z2 ― ^ol- 


A geometric interpretation of this proposition is that if 汪 power series — with 
positive powers 一 ~ converges for a point at a distance r\ from zo, then it converges 
for all interior points of the circle whose center is zo^ and whose radius is n. 
Similarly, if a power series — with negative powers — converges for a point at a 
distance 厂 2 from zo, then it converges for all exterior points of the circle whose 
center is zo and whose radius is r% (see Figure 9.9) - Generally speaking, positive 
powers are used for points inside a circle and negative powers for points outside 
it. 

The largest circle about such that the first power series of Proposition 9.6.1 
converges is called the circle of convergence of the power series. The propo¬ 
sition implies that the series cannot converge at any point outside the circle of 
convergence. (Why?) 

In determining the convergence of a power series 


00 

S(z) = Da n (z - zo)' 


(9.13) 



9.6 TAYLOR AND LAURENT SERIES 253 



Figure 9.9 (a) Power series with positive exponents converge for the interior points of a 

circle, (b) Power series with negative exponents converge for the exterior points of a circle. 


we look at the behavior of the sequence of partial sums 
N 

Sn(z) = ^a n (z- zo) n - 

n=0 

Convergence of (9.13) implies that for any s > 0, there exists an integer N e such 
that 

| 抑） — Sn(z)\ < £ whenever N > N e , 

uniform convergence 
explained 


power series are 
uniformly convergent 
and analytic 


power series can be 
differentiated and 
integrated term by 
term 


for any path y lying in the circle of convergence of the power series. 


In general, the integer N e may be dependent on z; that is, for different values of 
z, we may be forced to pick different N e ’s. When N e is independent of z, we say 
that the convergence is uniform. 

9.6.2. Theorem. The power series S(z) — YlT=o a n(z - zo) n ^ uniformly con¬ 
vergent for all points within its circle of convergence and represents a function 
that is analytic there. 

By substituting the reciprocal of (z — zo) in the power series，we can show that 
if bk/(z — zo) k is convergent in the annulus r 2 < \z — zol < then it is 
unifornaly convergent for all z in that annulus. 

9.6.3. Theorem, A convergent power series can be differentiated and integrated 

term by term; that is, ifS(z) = ― 如广，決⑼ 

= y^na n (z - zd) n ~ x , f S(z) dz — f - zo) n dz 
az n=i J y n=o J y 





254 9. COMPLEX CALCULUS 


Taylor series 


Maclaurin series 


Laurent series 


9,6.2 Taylor and Laurent Series 


We now state and prove the two main theorems of this section. A Taylor series 
consists of terms with only positive powers. A Laurent series allows for negative 
powers as well. 

9.6.4. Theorem. (Taylor series) Let f be analytic throughout the interior of a 
circle Co having radius ro and centered at zo- Then at each point z inside Co, 


00 


f(z) = f(zo) + fizoXz - zo) + 


…準卜， 

n=0 


n\ 


(9.14) 


That is，the power series converges to f (z) when \z — zo\ < ^0- 
Proof. From the CIF and the fact that z is inside Co, we have 


f(z) 


1 


m 


2 龙 , Jc 0 与—之 
On the other hand, 

1 1 


供 . 




^ -ZO~\~ZQ-Z 
1 1 


(§ - zo) 


-ZO/ 


00 


1 -尚 


/Z-Z0\ n 
^ -zq/ 


The last equality follows from the fact that \(z — zo)/(^ — zo)\ < 1 — because z 
is inside the circle Co and ^ is on it — and from the sum of a geometric series. 
Substituting in the CIF and using Theorem 9.5.2, we obtain the result. □ 

For zo = 0 we obtain the Maclaurin series: 


nz) = m + rm + 






n=0 


n\ 


The Taylor expansion requires analyticity of the function at all points interior 
to the circle Co. On many occasions there may be a point inside Co at which the 
function is not analytic. The Laurent series accommodates such cases. 

9,6«5, Theorem. (Laurent series) Let C\ and C 2 be circles of radii r\ and ^ 2 , both 
centered at zo in the z-plane with r\ > r 2 . Let f :C ^ Cbe analytic on C\ and 
C 2 and throughout S，the annular region between the two circles. Then, at each 
point z e S, f(z) is given by 


00 

f(z)= - Zo) n 

00 


where 



1 i m ^ 

2iti f c 这 - 功 ) 朴 1 


and C is any contour within S that encircles zo- 


9.6 TAYLOR AND LAURENT SERIES 255 



Figure 9-10 The annular region within and on whose contour the expanded function is 
analytic. 


Proof. Let y be a small closed contour in S enclosing z, as shown in Figure 9.10. 
For the composite contour C f the Cauchy-Goursat theorem gives 



r m 

'c f ^ - Z 


d 与 




dH~ 


m 



炎一 



m 


诉， 


where they andC 2 integrations are negative because their interior lies to our right 
as we traverse them. The y integral is simply 2nif{z) by the CIF. Thus, we obtain 

2nif{z) = (f (9.15) 

JCi S ~ z Jc 2 f 一 Z 

Now we use the same trick we used in deriving the Taylor expansion. Since z is 
located in the annular region, r 2 < \z — zo\ < O- We have to keep this in mind 
when expanding the fractions. In particular, for ^ € Ci we want the § term in 
the denominator, and for ^ 6 C 2 we want it in the numerator. Substituting such 
expansions in Equation (9.15) yields 


OO 


2jti/(z) = - zo) n 

n=0 

^ 1 

+ E7T37, 


/ m 辟 

r Cl ($ - zo) n+1 


(9.16) 


«=o 


(z- zo)^ 1 Ic 2 


nm-zo) n d^ 


Now we consider an arbitrary contour C in 5 that encircles zo- Figure 9.11 
shows a region bounded by a contour composed of C\ and C. In this region 




256 9. COMPLEX CALCULUS 


- zo) n+l is analytic (because ^ can never equal zo)- Thus, the integral 
over the composite contour must vanish by the Cauchy-Goursat theorem. It follows 
that the integral over C\ is equal to that over C. A similar argument shows that the 
C 2 integral can also be replaced by an integral over C. We let n + 1 = -m in the 
second sum of Equation (9.16) to transform it into 


-00 




md 与 


m= 


! (z- zoy m Jc 


鄕卜-广 1 心 U 


Changing the dummy index back to n and substituting the result in Equation (9.16) 
yields 




n=—l 


m 


辟 • 


We can now combine the sums and divide both sides by 2ni to get the desired 
expansion. 口 


The Laurent expansion is convergent as long asr 2 < \z —zo 1 < r\.In particular, 
ifr 2 = 0, and if the function is analytic throughout the interior of the larger circle, 
then a n will be zero for « = — 1 ， 一 2, . •. because / (?)/(§ — zo) n+1 will be analytic 
fox negative w, and the integral will be zero by the Cauchy-Goursat theorem. Thus, 
only positive powers of (z - zo) will be present in the series, and we will recover 
the Taylor series, as we should. 

It is clear that we can expand C\ and shrink C 2 until we encounter a point at 
which / is no longer analytic. This is obvious from the construction of the proof, 
in which only the analyticity in the annular region is important, not its size. Thus, 
we can include all the possible analytic points by expanding C\ and shrinking C 2 . 

9.6.6. Example. Let us expand some functions in terms of series. For an entire function 
there is no point in the entire complex plane at which it is not analytic. Thus, only positive 
powers of (z — zo) will be present, and we will have a Taylor expansion that is valid for all 
values of z> 

⑻ Let us expand e z around zo = 0. The «th derivative of is e z . Thus, / ⑻ (0) = 1, and 
Taylor (Maclaurin) expansion gives 


e* 




00 ^ 


nl 


E 


n\ 


n=0 n-. 

(b) The Maclaurin series for sin z is obtained by noting that 


d n 


smz 


0 

X 


if n is even, 
( 一 l)(»-l)/2 if n is odd 


and substituting this in the Maclaurin expansion: 






n odd 


k=0 


(2k + 1)! 


9.6 TAYLOR AND LAURENT SERIES 257 



Figure 9.11 The arbitrary contour in the annular region used in the Laurent expansion. 


Similarly, we can obtain 

oo 2 k 

一 5兩. 

(c) The function 1/(1 + z) is not entire, so the region of its convergence is limited. Let us 
find the Maclaurin expansion of this function. The function is analytic within all circles 
of radii 尸 < 1. At r = 1 we encounter a singularity, the point z =： -1. Thus, the series 
converges for all points 11 z for which |z| < 1. For such points we have 


oo 


cosz 


k=0 


k z 


2 k 


(iky ： 


00 


sinhz = 


r 2M 


feS(2 /： + l)! 




Thus, 

= E(-w. 

n=0 


1 = f, f (n \0) n 




Taylor and Laurent series allow us to express an analytic function as a power 
series. For a Taylor series of f(z), the expansion is routine because the coeffi¬ 
cient of its nth term is simply f^ n Hzo)/n\, where zq is the center of the circle of 
convergence. When a Laurent series is applicable, however, the nth coefficient is 
not，in general, easy to evaluate. Usually it can be found by inspection and certain 
manipulations of other known series. But if we use such an intuitive approach 


11 As remarked before, the series diverges for all points outside the circle \z\ — 1. This does not mean that the function cannot 
be represented by a series for points outside the circle. On the contrary, we shall see shortly that Laurent series, with negative 
powers of z — zo are designed precisely for such a purpose. 


258 9. COMPLEX CALCULUS 


to determine the coefficients, can we be sure that we have obtained the correct 
Laurent series? The following theorem answers this question. 

Laurent series Is 9.6*7* Theorem. If the series Yl^=-oo a n(z — zo) n converges to f(z) at all points 
unique f n some annular region about zo, then it is the unique Laurent series expansion of 
f(z) in that region. 

Proof. Multiply both sides of f(z) = J2^=-oo a n(z - zo) n by 
1 

27ti(z - zo ) k+1 ' 

integrate the result along a contour C in the annular region, and use the easily 
verifiable fact that 

2ni (z- zo 户 - 朴 1 ~ Skn 

to obtain 


1 i f(z), 

^ifcV=^ dZ= ^ 

Thus, the coefficient in the power series of / is precisely the coefficient in the 
Laurent series, and the two must be identical. □ 


We will look at some examples that illustrate the abstract ideas developed in 
the preceding collection of theorems and propositions. However, we can consider 
a much broader range of examples if we know the arithmetic of power series. 
The following theorem giving arithmetical manipulations with power series is not 
difficult to prove (see [Chur 74]). 


You can add, 
subtract, and 
multiply convergent 
power series 


9.6.8. Theorem. Let the two power series f(z) = Yl^L-oo a n(z—zo) n andg(z )= 
Y^L-oo^niz — zqT be convergent within some annular region < \z—zo\ < r\. 
Then the sum YlT=-oQ^ a n + 〜 ） ( 之一 zo) n converges to / (z)+g (z), and the product 

OO 00 oo 

XI ^2 ZQ) m ^ n = ^ C k (z - Zo) k 

oo m=—oo k=-~oo 


converges to f(z)g(z) for z interior to the annular region. Furthermore, if 
g(z) 0 for some neighborhood of zq ，then the series obtained by long divi¬ 
sion of J2 T=-oo a n(z - zo) n by Em=-co b m(z-zo) m converges to f(z)/g(z) in 
that neighborhood. 

This theorem, in essence, says that converging power series can be manipulated 
as though they were finite sums (polynomials). Such manipulations are extremely 
useful when dealing with Taylor and Laurent expansions in which the straightfor¬ 
ward calculation of coefficients maybe tedious. The following examples illustrate 
the power of infinite-series arithmetic. 




9.6 TAYLOR AND LAURENT SERIES 259 


9.6.9. Example. To expand the function f(z)= 
rewrite it as 


2 + 3^ 
Z 2 -hZ 3 


in a Laurent series about z = 


0, 


m = ? (irl) = ?( 3 -iri) = ?( 3_ „§«’) 

=-4 (3 — 1 -1- z — ^： 2 + z 3 - ) = -^* H - 1 H- z -* z 2 H - . 

Z L Z z z 

This series converges forO < \z\ < 1. We note that negative powers of z are also present. 
Using the notation of Theorem 9.6.5, we have a _2 = 2, a—i = 1, a n = 0 for n < 一 3, and 
a n = (—l) rt+1 forn > 0. 圈 

9.6.10. Example. The function f(z) = l/(4z — z 2 ) is the ratio of two entire functions. 
Therefore, by Theorem 9.6.8, it is analytic everywhere except at the zeros of its denominator, 
z = 0 and z = 4. For the annular region (here of Theorem 9.6.5 is zero) 0 < |^| < 4, we 
expand / (z) in the Laurent series around z = 0. Instead of actually calculating a n , we first 
note that 




z/4 


The second factor can be expanded in a geometric series because jz/4| < 1: 


— z/4 


oo oo 

5© f 


Dividing this by 4z f and noting that z = 0 is the only zero of 4z and is excluded from the 
annular region, we obtain the expansion 


oo 


OO 


/( 扣 




n=0 


Although we derived this series using manipulations of other series, the uniqueness of series 
representations assures us that this is the Laurent series for the indicated region. 

How can we represent f(z) in the region for which \z\ > 4? This region is exterior to 
the circle \z\ = 4, so we expect negative powers of z. To find the Laurent expansion we 
write 


and note that \4/z\ < 1 for points exterior to the larger circle. The second factor can be 
written as a geometric series: 



12 This is a reflection of the fact that the function is not analytic inside the entire circle |z| = 1; it blows up at z = 0. 



260 9. COMPLEX CALCULUS 


Dividing by —z 2 , which is nonzero in the region exterior to the larger circle, yields 


00 


f(Z) = -!>，■ 

n=0 


-n—2 


9.6»11. Example. The function f(z) = z/[(z 一 1) (z - 2)] has a Taylor expansion around 
the origin for .|z| < 1. To find this expansion, we write 13 


m 


i 


2 


1 


1 


z-2 l-z 1 —z/2 


Expanding both fractions in geometric series (both \z\ and \z/2\ are less than 1), we obtain 
f(z) = ztt ~ l^^=o( z /2) n - Adding the two series — using Theorem 9.6.8 — yields 


oo 


f(z) = J](l - 2~ n )z n for |d < 1. 

«=0 

This is the unique Taylor expansion of f(z) within the circle \z\ = 1. 

For 1 < |z| < 2 we have a Laurent series. To obtain this series, write 


f(z) 


Vz 


1 


l/z-l l-z/2 




z/2 


Since both fractions on the RHS converge in the annular region (|l/z| < 1, [z/2| < 1), we 
get 


m 


oo y \ K n oo . „ oo oo 

E0 -E© 

n=0 Z n=0 n=0 n=0 


n- 

-oo 


oo 


00 


=-E^-E 2 ' n ^= E 

n=-l n=0 m=-oo 


where a n = —1 for« < 0 anda M = —2~ n for n > 0. This is the unique Laurent expansion 
of / (z) in the given region. 

Finally, for \z\ > 2 we have only negative powers of z. We obtain the expansion in this 
region by rewriting /(z) as follows: 


m 


Vz 


Vz 


l-l/z 1-2/Z 

Expanding the fractions yields 
00 oo 


00 


1+1 - Dz 

This is again the unique expansion of / (z) in the region \z\ > 2. 


f(z) = Z+ 1 + E 2 n ^ x z~ n ~ l = 

n=0 «=0 h =0 


—n—1 


13 We could, of course, evaluate the derivatives of all orders of the function atz = 0 and use Maclaurin's formula. However, 
the present method gives the same result much more quickly. 



9.6 TAYLOR AND LAURENT SERIES 261 


9.6.12. Example. Define /(z) as 


/W = 


(1 -cosz)/z 2 
1 
2 


for ^ ^ 0, 
for z = 0. 


We can show that /(z) is an entire function. 

Since 1 — cosz and 之 2 are entire functions, their ratio is analytic everywhere except at 
the zeros of its denominator. The only such zero is z = 0. Thus, Theorem 9,6.8 implies that 
/ (z) is analytic everywhere except possibly at z = 0. To see the behavior of f(z) at z = 0, 
we look at its Maclaurin series: 


00 


i — cosz = i — y^(-i) n 


,2n 




which implies that 


1 一 cos z 


00 




z 


2n-2 


(2n)\ 2 




The expansion on the RHS shows that the value of the series at z = 0 is | ， which, by 
definition, is /(0). Thus, the series converges for all z, and Theorem 9.6.2 says that /(z) is 
entire. 豳 


A Laurent series can give information about the integral of a function around 
a closed contour in whose interior the function may not be analytic. In fact, the 
coefficient of the first negative power in a Laurent series is given by 

fl-i = ^ /(§) 啦 . （ 9.H) 

Thus, to find the integral of a (nonanalytic) function around a closed contour sur¬ 
rounding zo, we write the Laurent series for the function and read off the coefficient 
of the l/(z-zo) term. 

9,6.13. Example. As an illustration of this idea, let us evaluate the integral I = § c dzj 
[z 2 (z — 2)], where C is the circle of radius 1 centered at the origin. The function is analytic 
in the annular region 0 < |z| < 2. We can therefore expand it as a Laurent series about 
2 = 0 in that region: 



Thus, a-\ = -|,and/ c dz/[z 2 (z - 2)] = 27ria-\ = -in/2. A direct evaluation of the 
integral is nontrivial. In fact, we will see later that to find certain integrals, it is advantageous 
to cast them in the form of a contour integral and use either Equation (9.17) or a related 
equation. 經 



262 9. COMPLEX CALCULUS 


zero of order k 


the zeros of an 
analytic function are 
isolated 

simple zero 


Let / : C C be analytic at zo- Then by definition, there exists a neighbor¬ 
hood of zo in which / is analytic. In particular, we can find a circle |z—zol = r > 0 
in whose interior / has a Taylor expansion. 

9.6.14. Definition. Let 


00 


OO 


f(z) = ~ z °y i = Yl an ^ z ~ z °^ n - 

n=：0 • /i=0 


Then f is said to have a zero of order k at zo if f ⑹ (zo) = ^f or n = 0,1, 

but f^ k \zo) ^ 0 . 

In that case f(z) = {z-zo) k a k-^n(z- zo) n , where 砌 尹 0 and |z — zo 1 < ”. 

We define g(z) as 


00 

g(z) = a ^n{z - zo) n where \z-zo\ < r 

n=0 . 


and note that g(zo) = ^ 0. Convergence of the series on the RHS implies that 

g(z) is continuous at zo- Consequently, for each 6 > 0, there exists 8 such that 
|g(z) — ak\ < € whenever \z — zo\ < If we choose € = \ak\/2, then, for some 
So > 0, |^(^) — < \ak\/2 whenever \z — zo\ < Thus, as long as z is inside 

the circle \z — 2ol < 知，及 ( 之 ） cannot vanish (because if it did the first inequality 
would imply that \a^\ < | 似 |/2). We therefore have the following result. 


9.6.15. Theorem. Let f : C C be analytic at zo and f(zo) = 0. Then there 
exists a neighborhood of zq throughout which f has no other zeros unless f is 
identically zero there. Thus, the zeros of an analytic function are isolated. 

When /c = 1, we say that zo is a simple zero of /. To find the order of the zero 
of a function at a point, we differentiate the ftmction, evaluate the derivative at that 
point, and continue the process until we obtain a nonzero value for the derivative. 


9.6.16. Example ♦⑻ The zeros of cos^, which are z = (2k + 1) 丌 /2, are all simple, 
because 


d 

— cosz 

dz z=(2k-\-l)n/2 


= — sin 




(b) To find the order of the zero of / ⑵ = 〆 一 1 — z — z 2 /2 at z = 0, we differentiate 
f(z) and evaluate f f (0): 

f f (0) = {e z -l-z) z =o = 0. 

Differentiating again gives = (e z — l) z _o = 0. Differentiating once more yields 
/"’(0) = (e z ) z= Q = 1. Thus, the zero is of order 3. M 



9.7 PROBLEMS 263 


9.7 Problems 

9.1. Show that the function w; = 1/z maps the straight line y = ain the z-plane 
onto a circle in the w-plane with radius l/(2\a\) and center (0, l/(2a)). 

9.2. (a) Using the chain rule, find df/dz* and df/dz in terms of partial derivatives 
with respect to x and y. 

(b) Evaluate df/dz* and df/dz assuming that the Cauchy-Riemann conditions 
hold. 

9.3. Show that when z is represented by polar coordinates, the derivative of a 
function f(z) can be written as 

dz \9r dr) 9 

where U and V are the real and imaginary parts of f(z) written in polar coor¬ 
dinates. What are the C-R conditions in polar coordinates? Hint: Start with the 
C-R conditions in Cartesian coordinates and apply the chain rule to them using 
x = r cos6 and y = r smO. 

9.4. Show that d/dzQnz) = 1/z. Hint: Find u(x, y) and v(x, y) for Inz and 
differentiate them. 

9.5. Show that sin z and cosz have only real roots. 

9.6. Show that 

(a) the sum and the product of two entire functions are entire, and 

(b) the ratio of two entire functions is analytic everywhere except at the zeros of 
the denominator. 

9.7. Given that u = 2X ln[(x 2 + : y 2 ) 1 ’ 2 ]，show that v = 2k tan— 1 (y/x)，where u 
and u are the real and imaginary parts of an analytic function w(z)- 

9.8. If w(z) is any complex potential, show that its (complex) derivative gives the 
components of the electric field. 

9.9. (a) Show that the flux through an element of area da of the lateral surface 
of a cylinder (with arbitrary cross section) is d(j) = dz(\E\ ds) where ds is an arc 
length along the equipotential surface. 

(b) Prove that |E| =： \dw/dz\ = Sv/ds where v is the imaginary part of the com¬ 
plex potential, and s is the parameter describing the length along the equipotential 
curves. 

(c) Combine (a) and (b) to get 

cj> 

flux per unit z-length = - = v(P 2 ) — v(P\) 

i Z2-ZI 

for any two points Pi and P 2 on the cross-sectional curve of the lateral surface. 
Conclude that the total flux per unit z-length through a cylinder (with arbitrary 



264 9. COMPLEX CALCULUS 


cross section) is [v] 9 the total change inu as one goes around the curve. 

(d) Using Gauss’s law, show that the capacitance per unit length for the capacitor 
consisting of the two conductors with potentials u\ and U 2 is 

charge per unit length [v]/4 jt 

c = - = - 

potential difference \U 2 — nil 

9.10. Using Equation (9.7) 

(a) find the equipotential curves (curves of constant u) and curves of constant v 
for two line charges of equal magnitude and opposite signs located at y = a and 
y = —a in the xy-plane. 

(b) Show that 

a ( sin sinh / ( cosh ± _ cos 
by solving Equation (9.7) for ^ and simplifying. 

(c) Show that the equipotential curves are circles in the xy-p\anc of radii 
a/ sinh(M/2 入） with centers at (0, a coth(w/2 入 )）， and that the curves of constant v 
are circles of radii a/ sin(v/2 入 ) with centers at (a cot(v/2A.), 0). 

9.11, In this problem, you will find the capacitance per unit length of two cylin¬ 
drical conductors of radii R\ and R 2 the distance between whose centers is D by 
looking for two line charge densities + 入 and —k such that the two cylinders are 
two of the equipotential surfaces. From Problem 9.10, we have 



a 

sinh(M//2 入 ) 


yi = acoih{ui/2k). 


i = l,2, 


where y\ and 乃 are the locations of the centers of the two conductors on the : y-axis 
(which we assume to connect the two centers). 

(a) Show that D = \y\ — y 2 \ = Ri cosh — R 2 cosh . 

(b) Square both sides and use cosh (a —b) = cosh a cosh b — sinha sinhi? and the 
expressions for the 及 ’s and the ’s given above to obtain 


cosh 


/Ul_-U2\ 
\ 2X ) 


Rl + Rj- D 2 
~ 2R X R 2 ~ 


(c) Now find the capacitance per unit length. Consider the special case of two 
concentric cylinders. 

(d) Find the capacitance per unit length of a cylinder and a plane, by letting one 
of the radii, say R\, goto infinity while h = R\—D remains fixed. 


9.7 PROBLEMS 265 


9.12. Use Equations (9.4) and (9.5) to establish the following identities. 


(a) Re (sin z) = sin ^ cosh 

(b) Re(cosz) = cos^: cosh 

(c) Re(sinhz) = sinhxcosy, 

(d) Re(coshz) = coshxcosy, 

(e) \ sinz| 2 = sin 2 x + sinh 2 y, 
(/) [ sinhz| 2 = sinh 2 x 4 - sin 2 y. 


Im(sinz) = cos x sinh 
Im(cos z) = — sin a: sinh 
Im(sinh^) = cosh x siny. 
Im(cosh^) = sinh x sin y ■ 

I cosz | 2 = cos 2 x 4 - sinh 2 y. 

I cosh z | 2 = sinh 2 x + cos 2 j 


9.13. Find all the zeros of sinhz and cosh z. 
9*14. Verify the following hyperbolic identities. 


(a) cosh 2 z — sinh 2 z = 1 . 

(b) coshfei + zi) = coshzi coshz 2 + sintui sinhz 2 - 

(c) sinh( 2 ：i + Z 2 ) = sinzi cosh Z 2 + coshzi sinhz 2 - 

{d) cosh 2z = cosh 2 z + sinh 2 之， sinh 2z = 2 sinh z cosh 

tanhzi + tanhz 2 


(e) tanh(zi + zi) 


1 +tanhzi tanh ^2 


9.15. Show that 


(a) tanh 



sinhx + i sin j 
coshx + cosy 


(h) coth (!)= 


sinh jc — ? sin y 
coshjc — cosy 


9.16. Find all values of z such that 


⑷ 〆 =- 3 . (b) / = 1 + /V3. (c) A - 1 = 1 . 

9.17. Show that \e~ z \ < 1 if and only if Re(z) > 0. 

9.18. Show that both the real and the imaginary parts of an analytic function are 
harmonic. 

9.19. Show that each of the following functions — call each one u(x, y) — is har¬ 
monic, and find the function’s harmonic partner, v(x, y), such that u(x, 3 ^) + 
叫文， y) is analytic. 

⑷ X 3 — 3xy 2 . (b) e x cos 3 ^. (c) ^ where x 2 -\-y 2 ^ 0 . 

(d)e~ 2y cos 2 x. (e) e y2 ~ x ^ cos2xy. 

(/) e x (xcosy - 3 ; sin j) + 2sinh ysmx-^x 3 - 3xy 2 + y- 


266 9. COMPLEX CALCULUS 


9.20. Prove the following identities. 

(a) cos - 1 z = —i ln(z 士 y/z 2 -^T). 
(c)tan-4 = ^ln(i^). 

(e) sinh -1 z = ln(z d= y/z 2 +T). 


(b) sin -1 z = —i ln[iz ± \J\ 一 z 2 )]. 
(d) cosh -1 z = ln(z 士 y/z 2 — 1). 

(/) tanh - 1 (|^) • 


9*21. Find the curve defined by each of the following equations. 


(a) z = 1 — it, 0 < t <2. 


(b)z = 

=t + " 2 ，—00 < t < OO 

7t 

3n 

(d)z = 

♦ 

l ^ 

(c) z = a(cos? + i sint), — < i 

• _ « 

一 2 

=^ —oo < t <0. 


9.22. Provide the details of the proof of part (a) of Proposition 9.3.1. Prove part 

d 2 ^ 3 2 0 

(b) by showing that if f(z) = ^ = x / 4 - iy f is analytic and -^r + = 0, then 


a 2 o 3 2 0 

dx a ^ dy ,2 


dx 2 dy 2 


0 . 


9.23. Let f{t) = u{t) + iv(t) be a (piecewise) continuous complex-valued 
function of a real variable t defined in the interval a < t < b. Show that if 
F(t) = U(t) + iV(t) is a function such that dF/dt = f(t), then 



= F(b)~ F(a). 


This is the fundamental theorem of calculus for complex variables. 

9*24* Find the value of the integral f c l(z + 2)/z]dz, where C is ⑻ the semicircle 
z = 2e l °, for 0 < 0 < 7T, (b) the semicircle z = 2e w y fom < 0 < lit, and (c) 
the circle z = 2e l °, for —n <9 <7t. 

9.25* Evaluate the integral f y dz/(z—l—i) where / is (a) the line joining zi =2i 
and zi = 3, and (b) the broken path from z\ to the origin and from there to 

9.26. Evaluate the integral f c z m (z t¥ ) n dz, where m and « are integers and C is the 
circle \z\ = 1 taken counterclockwise. 


9.27. Let C be the boundary of the square with vertices at the points z = 0, z = 1 ， 
z = 1 +and z = i with counterclockwise direction. Evaluate 


^ (5z + 2) dz and 丰 e nz *dz. 


9.7 PROBLEMS 267 


9.28. Let Ci be a simple closed contour. Deform C\ into a new contour C 2 in such 
a way that C\ does not encounter any singularity of an analytic function f in the 
process. Show that 




f(z)dz 


=<p f(z)dz. 
Jc 2 


That is, the contour can always be deformed into simpler shapes (such as a circle) 
and the integral evaluated. 

9*29* Use the result of the previous problem to show that 


dz 


「 c 卜 1 


2ni and 


(z-1- i) m ~ l dz = 0 for m = 土 1, ±2, •，. 


c 


when C is the boundary of a square with vertices at z = 0, z = 2, z = 2 + 2/, and 
z = 2i, taken counterclockwise. 

9.30. Use Equation (9.12) and the binomial expansion to show that 
dz 

9.31. Evaluate dz/(z 2 — 1) where C is the circle \z\ = 3 integrated in the 
positive sense. Hint: Deform C into a contour C ! that bypasses the singularities of 
the integrand. 

9.32. Show that when / is analytic within and on a simple closed contour C and 
zo is not on C, then 


/ f(z)dz 

'c Z-ZO 


if- 


f(z) dz 
zo) 


2 m 


9.33. Let C be the boundary of a square whose sides lie along the lines x = ±3 
and j = 士 3. For the positive sense of integration, evaluate each of the following 
integrals. 


⑷ 


/ e^ z f e z , , 、 i* cosz , 

f c ^ T2 dz ' ⑻九 ( C ) f c ( z 一以？ - 10 产 

id) ⑷ (/) 


f C Z 


cosz 

3 


dz. 


{g) 


0) 


cos 之 


fc (z- in/2) 2 
e z 


dz. (h) 


e z 


(z - in) 2 
sinh 之 


dz. (0 


Jc V 
£ cosz 

fc Z-hi7t 

coshz 


dz. 


Tc z 2 -5z + 4 dz ' (k) % {z- in 财此 （ l) f c (z - i 浙乙 

, 、 i tanz 為 ° 

(m) Tc 


dz for — 3 < a < 3. (n) 


fc (z - 2)(z 2 - 10) 


dz. 




268 9. COMPLEX CALCULUS 


9.34 Let C be the circle \z - r| = 3 integrated in the positive sense. Find the 
value of each of the following integrals. 


⑷ 


(d) 


e z 


r C 


Z 2 H- 7T 2 

dz 


dz- 


(b) 

(e) 


sinhz 


Tc (z 2 + it 2 ) 1 
( cosh z 
r c (z 2 + ^ 


dz. 


dz- 


(c) 

(/) 


dz 


z 2 + 9' 

/ z 2 — + 4 


r c (z 2 + 9)2 - … 7 c fe 2 + ^ 2 ) 3 Tcz 2 -4z + 3 

9.35. Show that Legendre polynomials (for |jc| < 1) can be represented as 

(-d w I (i - z 2 r 


dz. 


Pn(x) 


£ 


dz ， 


2 n (2ni) Jc (z - x) n + l 
where C is the unit circle around the origin. 

9.36. Let / be analytic within and on the circle yo given by \z — zo\ = ^*o and 
integrated in the positive sense ‘ Show that Cauchy’s inequality holds: 


\f in \zo)\ < 

r o 

where M is the maximum value of |/(z)| on 灿 

9.37* Expand sinh z in a Taylor series about the point z = in, 

9.38. What is the largest circle within which the Maclaurin series for tanhz con¬ 
verges to tanhz? 

9.39. Find the (unique) Laurent expansion of each of the following functions about 
the origin for its entire region of analyticity. 


⑷ (z-2Kz-3)' 

^ (T^)3* 

(0 ^r. 

z-l 


(b) zcos(z 2 ). 



(c) 


( 8 ) 


1 


zHl-zl 


z 4 


4 


z 2 -9' 


9*40. Show that the following functions are entire. 


(d) 


sinhz — z 
z 4 


⑻ 


1 

(z 2 -l) 2 * 


⑷ /(Z)= 


e 2z -l 2 
2 


for z _ 0, 
for z === 0. 


smz 

(b) f(z) = ^ 


for z _ 0, 
for 2 = 0. 


(c) f(z )= 


cosz 

z 2 — 丌 2 /4 
一 1/tt 


for z ^ ±tt/ 2, 
for z = 士 tt/2. 




9.7 PROBLEMS 269 


9.41. Let / be analytic at zo and f(zo) = f(zo) — ••• = f^Hzo) = 0. Show 
that the following function is analytic at zo' 


f(z) 

(z- zo) k+l 

g(z)= 

f ik+1) (zo) 
(k + 1)1 


for z zo 9 


for z = zo- 



9.42. Obtain the first few nonzero terms of the Laurent series expansion of each 
of the following functions about the origin. Also find the integral of the function 
along a small simple closed contour encircling the origin. 




⑻丄.⑷ 


Z 一 Sliu 


Z 4 1 

—) 6z + z 3 -6sinh2. (’ ） z 2 sinz* 


e z - 1 


Additional Reading 

1. Churchill, R. and Verhey, R. Complex Variables and Applications, 3rd ed., 
McGraw-Hill, 1974. An introductory text on complex variables with many 
examples and exercises. 

2. Lang ， S. Complex Analysis ， 2nd ed., Springer-Verlag, 1985. An extremely 
well-written book by a master expositor. Although the book has a formal 
tone, the clarity of exposition and the use of many examples make this book 
very readable. 


10 _ 

Calculus of Residues 


One of the most powerful tools made available by complex analysis is the theory of 
residues, which makes possible the routine evaluation of certain definite integrals 
that are impossible to calculate otherwise. The derivation, application, and analysis 
of this tool constitute the main focus of this chapter. In the preceding chapter we 
saw examples in which integrals were related to expansion coefficients of Laurent 
series. Here we will develop a systematic way of evaluating both real and complex 
integrals. 

10.1 Residues 

Recall that a singular point 之 o of / : C —► C is a point at which f fails to be 
analytic. If in addition, there is some neighborhood of zo in which / is analytic 
at every point (except of course at zo itself), then 功 is called an isolated singu- 
isolated singularity Iarity of /. Almost all the singularities we have encountered so far have been 

isolated singularities. However, we will see later~when discussing multivalued 
functions ― that singularities that are not isolated do exist. 

Let zo be an isolated singularity of /. Then there exists an r > 0 such that 
within the “annular” region 0 < \z ^ zo\ < the function / has the Laurent 
expansion 

OO OO 1 1 

f(z) = T] a n (Z- Z0) n = y^aniz- Z0) n + — — + -~~+ ■ • • 

/t=—oo n=0 ^ _ 如 _ Zo) 

where 



10.1 RESIDUES 271 


residue defined 


residue theorem 


In particular, 

bl = —if c m ^， (lai) 

where C is any simple closed contour around zo* traversed in the positive sense, 
on and interior to which / is analytic except at the point zo itself. The complex 
number b\^ which is essentially the integral of f(z) along the contour, is called 
the residue of / at the isolated singular point zq. It is important to note that the 
residue is independent of the contour C as long as zo is the only isolated singular 
point within C. 


Pierre Alphonse Laurent (1813-1854) graduated from the Ecole Poly technique near the 
top of his class and became a second lieutenant in the engineering corps. On his return from 
the war in Algeria, he took part in the effort to improve the port at Le Havre, spending 
six years there directing various parts of the project. Laurent’s superior officers admired 
the breadth of his practical experience and the good judgment it afforded the young en¬ 
gineer. During this period he wrote his first scientific paper, on the calculus of variations, 
and submitted it to the French Academy of Sciences for the grand prix in mathematics. 
Unfortunately the competition had already closed (although the judges had not yet declared 
a winner), and Laurent’s submission was not successful. However, the paper so impressed 
Cauchy that he recommended its publication, also without success. 

The paper for which Laurent is most well known suffered a similar fate. In it he described 
a more general form of a theorem earlier proven by Cauchy for the power series expansion of 
a function. Laurent realized that one could generalize this result to hold in any annular region 
between two singular or discontinuous points by using both positive and negative powers 
in the series, thus allowing treatment of regions beyond the first singular or discontinuous 
point. Again, Cauchy argued for the paper’s publication without success. The passage of time 
provided a more just reward, however, and the use of Laurent series became a fundamental 
tool in complex analysis. 

Laurent later worked in the theoiy of light waves and contended with Cauchy over the 
interpretation of the differential equations the latter had formulated to explain the behavior 
of light. Little came of his work in this area* however, and Laurent died at the age of forty- 
two, a captain serving on the committee on fortifications in Paris. His widow pressed to 
have two more of his papers read to the Academy, only one of which was published. 



We use the notation Res[/(z 0 )] to denote the residue of / at the isolated 
singular point zo. Equation (10.1) can then be written as 

^ f(z) dz — 2ni Res[/(zo)]* 

What if there are several isolated singular points within the simple closed 
contour C? The following theorem provides the answer. 

10.1.1. Theorem, (the residue theorem) Let C be a positively oriented simple 




272 10. CALCULUS OF RESIDUES 



Figure 10.1 Singularities are avoided by going around them. 


closed contour within and on which a function f is analytic except at a finite 
number of isolated singular points z\,Z 2 ,...,z m interior to C. Then 


m 


C 


f{z)dz = 2iti ^ Res[/(z*)]. 


( 10 . 2 ) 




Proof. Let Ck be the positively traversed circle around zk- Then Figure 10.1 and 
the Cauchy-Goursat theorem yield 


0 




f(z)dz 


_ / 

circles 


f(z)dz+ , f(z)dz 十 j) f(z)dz, 

parallel 

lines 


where C f is the union of all the contours, and the minus sign on the first integral 
is due to the interiors of all circles lie to our right as we traverse their boundaries. 
The contributions of the parallel lines cancel out, and we obtain 




m n m 

f(z)dz = ^2<f> f(z) dz = YL 2ni Res [/( 從)]， 
k=\ k=l 


where in the last step the definition of residue at Zk has been used. 


□ 


10.1*2. Example. Let us evaluate the integral § c {2z - 3) dz/[z(z — 1)] where C is the 
circle \z\ = 2. There are two isolated singularities in C，zi = 0 and Z 2 = 1. To find 
Res [/ (zi)]，we expand around the origin: 


2z — 


z(z - 1) z z - 1 


3 1 




for |z| < 1. 






10.2 CLASSIFICATION OF ISOLATED SINGULARITIES 273 


principal part of a 
function 


removable singular 
point 


poles defined 

simple pole 
essential singularity 


This gives Res[/fei)] = 3. Similarly, expanding around z = 1 gives 


2z-3 3_1_ = __1_ 

z(z — 1) z—l + l z — 1 z—1 


oo 
+ 3 



which yields Res [/ (Z 2 )] = —1- Thus, 

<S> dz = 27r/{Res[/fei)]+ Res[/(z 2 )]} = 2 tt/(3 - 1) - 
Jc z(z - 1) 



10.2 Classification of Isolated Singularities 

Let / : (C — C have an isolated singularity at Then there exist a real number 
r > 0 and an annular region 0 < k - 功 | < r such that / can be represented by 
the Laurent series 


00 OO h 

/( 扣 1) n + E (z J ZQ) n • (10 - 3) 

n=0 n=l yZ ^ 

The second sum in Equation (10.3), involving negative powers of (z — zo)，is 
called the principal part of / at zo- We can use the principal part to distinguish 
three types of isolated singularities. The behavior of the function near the isolated 
singularity is fundamentally different in each case. 


1. ]fb n = 0 for all k > 1, zo is called a removable singular point of /. In 
this case, the Laurent series contains only nonnegative powers of (z - zo), 
and setting f(zo) = makes the function analytic at zo- For example, the 
function f(z) = (e z -1—z) /z 2 9 which is indeterminate at z = 0, becomes 

\ z 

entire if we set / (0) = 士 ， because its Laurent series f(z) = ^4-^+—+* - - 
has no negative power. 

2. If 心 = 0 for all n > m and b m ^ 0, zo is called a pole of order m. In this 
case, the expansion takes the form 


OO 




bm 


n=0 


(z - zo) 


m 


for 0 < jz _ zol < In particular, if m = l, zo is called a simple pole. 

3. If the principal part of / at zo has an infinite number of nonzero terms, the 
point zo is called an essential singularity. A prototype of functions that 
have essential singularities is 






10.3 EVALUATION OF DEFINITE INTEGRALS 275 


and note that, since the degree of Pi is - 1 , all the terms in the preceding equation go 
to zero except possibly 裒 （ 1/0. Moreover, 

lim ^ oo, 

f —0 


because, by assumption, tiie point at infinity is not a pole of f. Thus, g is a bounded entire 
function. By Proposition 9.5.5 , 犮 must be a constant. Taking a common denominator for all 
the terms yields a ratio of two polynomials. 國 

The type of isolated singularity that is most important in applications is of the 
second typepoles. For a function that has a pole of order m at zo, the calculation 
of residues is routine. Such a calculation, in turn, enables us to evaluate many 


integrals effortlessly. How do we calculate the residue of a function / having a 
pole of order m at zo? 

It is clear that if / has a pole of order m, then g : C C defined by 
g(z) = (z- zo) m f(z) is analytic at^o- Thus, for any simple closed contour C that 
contains zo but no other singular point of /， we have 


Lltl j c 

/ 

In terms of / this yields ‘ 

1 


d m ~^ 


1 d m ~^ 

Res[/(z 0 )] = 7 - lim Viz — zo) m f(z)l 

(m — 1)! z^zq dz m 

For the special, but important, case of a simple pole, we obtain 
Res[/(^ 0 )] = lim [(z - zo)f(z)l 

Z-^-ZO 


(10.4) 


(10.5) 


10.3 Evaluation of Definite Integrals 


The most widespread application of residues occurs in the evaluation of real definite 
integrals. It is possible to “complexify” certain real definite integrals and relate 
them to contour integrations in the complex plane. We will discuss this method 
shortly; however, we first need a lemma. 

10.3.1. Lemma* (Jordan’s lemma) Ler Cr be a semicircle of radius R in the upper 
half of the complex plane (UHP) and centered at the origin. Let f be a function 
that tends uniformly to zero faster than \/\z\for arg(z) g [0, as |^| —^ oo. Let 
a be a nonnegative real number. Then 

lim = lim f e mz f(z) dz — 0 . 

R-^oo R-^oo JCr 


2 The limit is taken because in many cases the mere substitution of zo may result in an indeterminate form. 


276 10. CALCULUS OF RESIDUES 


Proof. For z G C/? we write z = Re lB , dz = iRe l0 d9 y and 
icxz = ia(R cosO -\-iRsmd) — iaRcosO — aRsmO 
and substitute in the absolute value of the integral to show that 

\Ir\ < e~ aRsine R\f(Re w )\d9, 

Jo 


By assumption, R\f(Re l0 )\ < e(R) independent of 0, where e(R) is an arbitrary 
positive number that tends to zero as R ^ oo. By breaking up the interval of 
integration into two equal pieces and changing 0 to 7r — 0 in the second integral, 
one can show that 


\Ir\ < 2 ^(/?) 



-aRsme d0 


Furthermore, sin^ > 29/ti for 0 < 0 < 丌 /2 (see Figure 10.2 for a “proof”). 
Thus, 


1 / 及 I < 2€(R) 


7t€(R) 


(1 一 〆 )， 


which goes to zero as R gets larger and larger. 


□ 


Note that Jordan’s lemma applies fora = Oas well, because (l—e~ aR ) aR 
as o? ^ 0. If a < 0, the lemma is still valid if the semicircle Cr is taken in 
the lower half of the complex plane (LHP) and f(z) goes to zero uniformly for 
宂 < argfe) < 2 tt. 

We are now in a position to apply the residue theorem to the evaluation of 
definite integrals. The three types of integrals most commonly encountered are 
discussed separately below. In all cases we assume that Jordan’s lemma holds. 


10.3.1 Integrals of Rational Functions 

The first type of integral we can evaluate using the residue theorem is of the form 




where p(x) and q(x) are real polynomials, and q(x) 尹 0 for any real 戈 . We can 
then write 


lim 

R-^oo 


R pM 
-r qi.x) 


lim f 噤心， 
R —°°Jc x q(z) 


where C x is the (open) contour lying on the real axis from -R to +/?. Assuming 
that Jordan's lemma holds, we can close that contour by adding to it the semicircle 



10.3 EVALUATION OF DEFINITE INTE GRALS 277 



Figure 10.2 The “proof ， ofsin^ > 20/jt forO <0 < n/2. The line is the graph of 
y =： 26/n\ the curve is that of ) = sin 0. 


of radius R [see Figure 10.3 ⑻], This will not affect the value of the intepal, 
because in the limit R oo, the contribution of the integral of the semicircle 
tends to zero. We close the contour in the UHP if q(z) has at least one zero there. 
We then get 


h 


([dz = 2ni y^Res 


7c q(z) 


P(zj) 

M^j)\ 


where C is the closed contour composed of the interval (-R, R) and the semicircle 
Cr, and {zj }， =1 are the zeros ofq(z) in the UHP. We may instead close the contour 

in the LHP, 3 in which case 

h = -27T/y]Res [^41 ， 

U u 幻)」 

where {zj} k j =l are the zeros of q(z) in the LHP. The minus sign indicates that in 
the LHP we (are forced to) integrate clockwise. 

10.3.2. Example* Let us evaluate the integral I = x 2 dx/[(x^ 4 - 1)(^ 2 +9)]. Since 
the integrand is even, we can extend the interval of integration to all real numbers (and 
divide the result by 2). It is shown below that Jordan’s lemma holds. Therefore, we write 
the contour integral corresponding to I: 


Z 2 dz 


2 ☆ (z 2 + l)fc 2 + 9)’ 


3 Provided that Jordan’s lemma holds there. 











278 10, CALCULUS OF RESIDUES 


where C is as shown in Figure 10.3(a). Note that the contour is traversed in the positive 
sense. This is always true for the UHP. The singularities of the function in the UHP are the 
simple poles i and 3i corresponding to the simple zeros of the denominator. The residues 
at these poles are 


Res[/(01 = lini( z -0 


,2 


1 


(z - i)(z + i)(z 2 + 9) 16 / ， 

Z 2 3 


Res[/(3i)] = z limX，- — 3 收 + 阳 - 
Thus, we obtain 


疒 oo 

Jo 


x 1 dx 


(x 2 + l)(x 2 + 9) 2 Jc (z 2 + l)(z 2 + 9) 




z 2 dz 


，(■ 


3 


\6i 




It is instructive to obtain the same results using the LHP. In this case, the contour is as 
shown in Figure 10.3 (b) and is taken clockwise, so we have to introduce a minus sign. The 
singular points are at z = —i and z = —3i. These are simple poles at which the residues of 
the function are 


,2 ! 

Res[/(—0] = lim (之 + 0 - - - =——， 

(z - i)(z-\- 收 2 + 9) 16/ 

,2 

Res[/(-30] = lim + - =- 

一 3/ (z 2 + l)(z-3/)k + 30 

Therefore, 


16/ 


rOQ 

Jo 


x 2 dx 


Z 2 dz 


(x 2 -H l)(x 2 + 9) 2 Jc (z 2 -i- l)(z 2 + 9) 


-7ti( 


3 




16/ !6iJ 


To show that Jordan’s lemma applies to this integral, we have only to establish that 
/?[/(/?^)| = 0. In the case at hand, a = 0 because there is no exponential 
function in the integrand. Thus, 


R\f(Re w )\^R 


R 2 e 2W 

(R 2 e 2ie -\-l)(R 2 e 2W +9) 


R 3 

\R 2 e 2w -\-l\\R 2 e 2w -\-9\ 9 


which clearly goes to zero as ► oo. 



10.3.3. Example. Let us now consider a slightly more complicated integral: 
f°° x 2 dx 

J-oc (^ 2 + 1)(x 2 +4)2 , 

which in the UHP turns into § c z 2 dz/[(z 2 + l)(z 2 4 - 4) 2 ]. The poles in the UHP are at 
z = / andz = 2i. The former is a simple pole, and the latter is a pole of order 2. Thus, 


10,3 EVALUATION OF DEFINITE INTEGRALS 279 



(a) (b) 

Figure 10.3 (a) The large semicircle is chosen in the UHP. (b) Note how the direction of 

contour integration is forced to be clockwise when the semicircle is chosen in the LHP. 


using Equations (10.5) and (10.4), we obtain 


Res [ 潮 'l^- 0 fe _ o(z + ou2 + 4)2 


Res[/(20] 


- lim — 

(2 - 1)! z^ 2 i dz 


d r o 

lim — (z - 2i) 2 - 
z^ 2 i dz L ( 

■ 》 _ 1_ 

zl dz L(z 2 + l)fe + 20 2 J 72f 


(z 2 + l)(z + 20 2 (z —20 2 


x 2, dx 


(- ik + 4) = i* 


J-oo(^ + 1)(^ 2 +4)2 — V 18f ■ 72" 36’ 

Closing the contour in the LHP would yield the same result. ® 

10.3.2 Products of Rational and Trigonometric Functions 

The second type of integral we can evaluate using the residue theorem is of the 
form 


pM 

-oo 


cos ax dx 


P(x) 


sin ax dx, 


where a is a real number, p{x) and q{x) are real polynomials in x y and q(x) has 
no real zeros. These integrals are the real and imaginary parts of 


PM 

q(x) 


e iax dx. 


280 10. CALCULUS OF RESIDUES 


The presence of e iax dictates the choice of the half-plane: If a > 0, we choose the 
UHP; otherwise, we choose the LHP. We must, of course, have enough powers of 
^ in the denominator to render R\p(Re w )/q(Re w )\ uniformly convergent to zero. 

10.3.4. Example. Let us evaluate /^[cos ax/(x 2 + 1) 2 ] dx where a ^=0. This integral 
is the real part of the integral & = e iax dx/(x 2 + l) 2 . When a > 0, we close in the 
UHP as advised by Jordan’s lemma. Then we proceed as for integrals of rational functions. 
Thus, we have 


h = 九 ( z 2 _|_ 2)2 = 2jrz Resf/G)] for a > 0 

because there is only one pole (of order 2) in the UHP at z = i. We next calculate the 
residue: 


Res[/(/)] 


=lim -j- (z ― i. 


d e iaz 1 

lim — - x = lim 

z-^i dz L (z + ?’) 2 」 


( z -0 2 te + /) 2 」 

1 ,, [(z + i)iae iaz -2e iaz 

=lim - - ——一 

」 z—/ [_ (z + i) 3 


(1+a), 


Substituting this in the expression for ^ we obtain I 2 = ~e~ a (l + a) fora > 0. 

_ 

When ^ < 0, we have to close the contour in the LHP, where the pole of order 2 is at 
z = —i and the contour is taken clockwise. Thus, we get 


h= <p —~n - wdz= —2ni Res[/(—/)] for a <0. 

Jc (z z + iy 


For the residue we obtain 


Res[/(—0] = lim 


d_ ' 
i dz 


(z + 0 


2 

(z - i) 2 (z 4 - o 2 


(1 一 £1)， 


and the expression for I 2 becomes I 2 = — e a (l — a) for a < Q. We can combine the two 
results and write *" 


/ 一 COSOJC 丌 1^1 

/ 7TT^2 dx = Re (&) = /2 = ^0 + \a\)e~^. 

J-CX) + l) z 2 

10.3.5, Example* As another example, let us evaluate 


x sin ax 
jc 4 + 4 


dx where a ^0. 


This is the imaginary part of the integral / 2 = xe iax dx/{x A -\-A), which, in terms of 
z and for the closed contour in the UHP (when a > 0), becomes 


ze iaz 


^ dz = 2tzI ^2 R es [/(y)] for a >0. 


( 10 . 6 ) 



10.3 EVALUATION OF DEFINITE INTEGRALS 281 


The singularities are determined by the zeros of the denominator: z 4 十 4 = 0, or z = 
1 士 i _，一 1 士 Of these four simple poles only two, 1 + i and —1 + /, are in the UHP. We 
now calculate the residues: 


Res[/(1 + /)] 


• z 产 

hm_Xz - 1 - + + i + 


(1 + i*y fl(1+0 e ia e 
(2/)(2)(2 + 2i) = 


•a 


Res[/(-l+01 


Z J ^+ 户 + 1 一 °(z + 1 - i)(z + 1 + i)(z - 1 - i)(z - 1 + 0 


(一 1 (一 1_H) 
( 2 z ) (- 2)(-2 + 20 


e -ia e -a 


Si 


Substituting in Equation (10.6), we obtain 


12 = 27ri^-(e la — e~ ia ) = sin a. 


Thus, 


►00 


xs ^ ax dx = Im(/ 2 ) = ^re~ a sin a 

00 文斗 + 4 2 


for a > 0. 


(10.7) 


For a < 0, we could close the contour in the LHP. But there is an easier way of getting to 
the answer. We note that —a > 0, and Equation (10.7) yields 


■00 




dx 


/ 


00 xsm[(-a)x] 


00 


J—oo / + 4 

We can collect the two cases in 


: c 4 +4 


dx = 血 (- 0 ) — ^e a sin a. 


■oo 


■OO 


x sin ax 
jc 4 +4 


dx 


—e~^ sma. 


10.3.3 Functions of IVigonometric Functions 

The third type of integral we can evaluate using the residue theorem involves only 
trigonometric functions and is of the form 


•27T 


F(sin9 f cos 6) dO, 


where F is some (typically rational) function of its arguments. Since 0 varies from 
0 to 2 tt ， we can consider it an argument of a point z on the unit circle centered at the 
origin. Then z — ^ and e~^ == 1/z, and we can substitute cos 沒 =(z + l/z)/2, 
sinO = (z- l/z)/(2i\ and dO = dz/(iz)mthQ original integral, to obtain 


F 


( 


Z - l/z z-\- l/z\ dz_ 


) 


IZ 


JC 、 2i ， 2 

This integral can often be evaluated using the method of residues. 



282 10. CALCULUS OF RESIDUES 


10.3*6, Example. Let us evaluate the integral dO/(l - {- a cos 0) where \a\ < 
Substituting for cos 0 anddO in terms of z，we obtain 

/ dz/iz 2 / dz 


Tc 1 + a[(z 2 + l)/(2z)] "" i 令 c 2z + az 2 + a’ 

where C is the unit circle centered at the origin. The singularities of the integrand 
zeros of its denominator: 


the 




—1 + Vl — a 2 


and 


a 


Z2 


—1 — y/\ — 


a 


For |a [ < litis clear that Z 2 will lie outside the unit circle C; therefore, it does not contribute 
to the integral. But^i lies inside, and we obtain 

/c2 Z + i + a =23ri ' ReS[/(Zl)] - 

The residue of the simple pole at z\ can be calculated: 


Res [作 = 


A (」 

a Vzi - 


Z\-Z2 




1 


ly/i-a 2 ) 2D. 


It follows that 
f 27t _ d0 

Jo 


H 


dz 


2 


■2ni 


1 


2jt 


+ 0 COS 沒 i Jc 2 z + az 2 +a i \ 2 y/l-a 2 J y/l-a 2 
10.3.7. Example. As another example, let us consider the integral 
dO 


、 7t 


where a > \. 


Jo (a + cos 0 )^ 

Since cos 0 is an even function of 0, we may write 

d$ 


'it 


2 J-jt (a + cos 0) 2 


where a > 1 . 


This integration is over a complete cycle around the origin, and we can make the usual 
substitution: 


i 


dz/iz 


2 JC [a(z 2 -h l)/2z] 2 i Jc (z 2 + 2az + l) 2 * 




zdz 


The denominator has the roots zi = —a -{- y/a 2 — 1 and 12 = -a - y/a 2 - 1, which are 
both of order 2. The second root is outside the unit circle because a > l. Also, it is easily 
verified that for all a > l,z\ is inside the unit circle. Since q is a pole of order 2, we have 


Res[/(zi)] 


lim 


lim 


d 


d_ r_ z_ 

dz 1-(7 ? 


(z-z\Y 


(z - z\) 2 (z - Z 2) 2 

1 


2zi 


a 


dz L(z — Z 2 ) 2 ^ (Z 1 —Z 2) 2 (z\—Z2) 3 4(a 2 — l) 3 / 2 


We thus obtain I = \lizi Res[/(zi)] 


7ta 


(a 2 - 1)3/2 - 



10.3 EVALUATION OF DEFINITE INTEGRALS 283 


10.3.4 Some Other Integrals 

The three types of definite integrals discussed above do not exhaust all possible 
applications of the residue theorem. There are other integrals that do not fit into 
any of the foregoing three categories but are still manageable. As the next two 
examples demonstrate, an ingenious choice of contours allows evaluation of other 
types of integrals. 

10«3.8» Example. Let us evaluate the Gaussian integral 



where a, /? € M, b > 0. 


Completing squares in the exponent, we have 


•oo 


■00 


R 


-bix-ia/mf-^I^b^^^lAb lim / e -b[x-ia/(2b)] d% 

R-^OO J—R 


If we change the variable of integration to z—x — ia/(2b), we obtain 


^ e -a 2 /m 


lim 广 fl/(2 V 

R^ooJ-R-ia/(2b) 


Let us now define Ir ： 

f R-ia/(2b) u 2 
I R = / e~ bz dz. 

J-R-ia/(2b) 

This is an integral along a straight line C\ that is parallel to the 又 -axis (see Figure 10.4). 
We close the contour as shown and note that e— bz is analytic throughout the interior of 
the closed contour (it is an entire function!). Thus, the contour integral must vanish by the 
Cauchy-Goursat theorem. So we obtain 


叫 c 广 dz + 


■R 


'R 


■ bx2 dx+ f e~ bz2 dz = 0 . 

JC A 


Along C 3 , z = R^iy and 

fc 广 2dz = f 0 


e~ b(R ~^ iy)2 i dy = ie 


ia/(2b) 


■0 


■ia/(2b) 


f by 2 -2ibRy 


dy 


which clearly tends to zero as oo. We get a similar result for the integral along C 4 . 
Therefore, we have 


R 


R 


e~ bx2 dx lim Ir 
R-^ oo 


■00 


e~ bx2 dx 


■00 




Finally, we get 
roo 


iax—bx 2 


dx 


-00 




一 a 2 / ⑽ 




284 10. CALCULUS OF RESIDUES 



Figure 10.4 The contour for the evaluation of the Gaussian integral. 


10.3.9. Example. Let us evaluate / = J^° dx/(x^ + 1). If the integrand were even, 
we could extend the lower limit of integration to 一 oo and close the contour in the UHP. 
Since this is not the case, we need to use a different trick. To get a hint as to how to close 
the contour, we study the singularities of the integrand. These are simply the roots of the 
denominator: z 3 = —1 or z n = /(^ 十 1 ) 71 ^ 3 with « = 0, 1, 2. These，as well as a contour 
that has only zo as an interior point, are shown in Figure 10.5. We thus have 


^C R z 3 + 1 Jc 2 z 3 + I 


2 ni Res[/(z 0 )]. 


TheCi? integral vanishes, as usual. Along Cu = re ia , withconstanta, so \hdidz 
and 


( 10 . 8 ) 


e ia dr 


f dz 
’c 2 z 3 +1 


f° e ia dr 

L (re ia ) 3 + 1 


Jo 


dr 

r 3 e 3ia + i • 


In particular, if we choose 3a = lit, we obtain 


f dz 
,C 2 z 3 -hi 


， / 2tt/3 


，0 ° dr 
) r 3 + 




Substituting this in Equation (10.8) gives 

(1 - e i2n ’ 3 )I = 2iri Res[/(z 0 )] ^ 


e i2n/3 


Res[/(z 0 )]. 


On the other hand. 


呵⑽卜 ^(z- ^o)^^- )(z _ z2) 


(Zo — Z\)(zo — Z 2 ) (e^/3 — e in^ e in/3 _ e i5ji/3^ 


These last two equations yield 


?i2jr/3 (gjjr/3 _ e i7r^ e i7r/3 _ e i57t/3^ 


2 jt 


团 



10.3 EVALUATION OF DEFINITE INTEGRALS 285 


principal value of an 
integral 



Figure 10.5 The contour is chosen so that only one of the poles lies inside. 

10.3.5 Principal Value of an Integral 

So far we have discussed only integrals of functions that have no singularities on 
the contour. Let us now investigate the consequences of the presence of singular 
points on the contour. Consider the integral 



fix) 

X — XQ 


dx ， 


(10.9) 


where xo is a real number and / is analytic at 和 . To avoid xq — which causes the 
integrand to diverge~we bypass it by indenting the contour as shown in Figure 
10.6 and denoting the new contour by C u . The contour Co is simply a semicircle 
of radius e. For the contour C u , we have 


f f(z) 

'c u Z-XQ 




^ — *^0 Jxq+€ ^ — -^0 JCq ^ -^0 


In the limit 6 - 0, the sum of the first two terms on the RHS — when it exists — 
defines the principal value of the integral in Equation (10.9): 

J-OQ X - XQ 


'Xq~€ 


fix) 


*oo 


fix) 


-oo 


X 


dx+ I 

■^0 J 太 0 +€ J — 邱 


The integral over the semicircle is calculated by noting that z — xo = ee lB and 
dz — iee l9 dO: f C(j f{z) dz/(z — ^o) — —inf(xo). Therefore, 


Xu Z-X0 


dz = P 



fM 

X — Xq 


dx — i7tf{XQ). 


( 10 . 10 ) 



286 10. CALCULUS OF RESIDUES 



Figure 10.6 The contour C u avoids ^o- 


On the other hand, if Co is taken below the singularity on a contour C^, say, we 
obtain 


f f(z) 
'c d z-xo 



八 X ) dx 4 - iTrf(xo). 
x — XQ 


We see that the contour integral depends on how the singular point xo is avoided. 
However, the principal value, if it exists, is unique. To calculate this principal value 
we close the contour by adding a large semicircle to it as before, assuming that the 
contribution from this semicircle goes to zero by Jordan’s lemma. The contours 
C u and Cd are replaced by a closed contour, and the value of the integral will be 
given by the residue theorem. We therefore have 


P 「 Hdx = ±i7tf(x 0 ) 

J-oo x-xo IZj -x 0 _ 


( 10 , 11 ) 


where the plus sign corresponds to placing the infinitesimal semicircle in the UHP, 
as shown in Figure 10.6, and the minus sign corresponds to the other choice. 


10.3.10. Example. Let us use the principal-value method to evaluate the integral 



smx 


x 


dx 


2 



sin^:, 

- dx. 

x 


It appears that jc = 0 is a singular point of the integrand; in reality, however, it is only a 
removable singularity, as can be verified by the Taylor expansion of sinx/x. To make use 
of the principal-value method, we write 



We now use Equation (10.11) with the small circle in the UHP, noting that there are no 
singularities for e lx /x there. This yields 

/ oo £ ix 

—— dx — iire^ = in. 

-oo ^ 


Therefore, 



sin a: 
x 


dx 


2 


Jm(i7r) 


it 

2 



10.3 EVALUATION OF DEFINITE INTEGRALS 287 


Figure 10.7 The equivalent contour obtained by “stretching” ， the contour of Figure 

10 . 6 . 


The principal value of an integral can be written more compactly if we deform 
the contour C u by stretching it into that shown in Figure 10.7. For small enough 6, 
such a deformation will not change the number of singularities within the infinite 
closed contour. Thus, the LHS of Equation (10.10) will have limits of integration 
—oo -j- i€ and +oo + i€. If we change the variable of integration to ? — z — i€, 
this integral becomes 


J-00 ^ -\-i€ — ^0 


fOO 

•/-oo 


m 辟 

夸 -XQ + i€ 


厂 f(z) dz 

J—oo 之 — 文 0 + z 石 


( 10 . 12 ) 


where in the last step we changed the dummy integration variable back to z- Note 
that since / is assumed to be continuous at all points on the contour, /(?+/€) —^ 
/(^) for small 6. The last integral of Equation (10.12) shows that there is no 
singularity on the new x-axis; we have pushed the singularity down to xq — i€. In 
other words, we have given the singularity on the x-axis a small negative imaginary 
part. We can thus rewrite Equation (10.10) as 


^^-dx = inf(XQ)+ 「 


f(x)dx 


where x is used instead of z in the last integral because we are indeed integrating 
along the new x-axis—assuming that no other singularities are present in the UHP. 
A similar argument, this time for the LHP, introduces a minus sign for the first 
term on the RHS and for the € term in the denominator. Therefore, 



f(x)dx 
x — 仰 土 k 


(10.13) 


where the plus (minus) sign refers to the UHP (LHP). This result is sometimes 
abbreviated as 


- —— =P —-— 干 i7tS(x — jco). (10.14) 

X — X0±i€ X — XQ 


288 10. CALCULUS OF RESIDUES 


The integral 
representation of the 
e (step) function 


theta (or step) 
function 


10.3.11. Example. Let us use residues to evaluate the function 


m 


•oo e ikx 


dx 


e > 0. 


—oo X — l€ 


We have to close the contour by adding a large semicircle. Whether we do this in the UHP 
or the LHP is dictated by the sign of fe: If > 0, we close in the UHP. Thus, 


m 


e ikz dz 


2jri Jc z — i€ 


Mz - 


Z-l€ 


lim 


Jkz - 


(z - i^) 


Z-l€ 


-fce 


6—^0 


On the other hand, ]fk<0, we must close in the LHP, in which the integrand is analytic. 
Thus, by the Cauchy-Goursat theorem, the integral vanishes. Therefore, we have 


m 


1 i£k>0 1 
0 if it <0. 


This is precisely the definition of the theta function (or step function). Thus, we have 
obtained an integral representation of that function: 


0(x) 


►oo 


-00 


dt. 


l€ 


Now suppose that there are two singular points on the real axis, at xi and X 2 , Let 
us avoid jq and X 2 by making little semicircles, as before, letting both semicircles 
be in the UHP (see Figure 10.8), Without writing the integrands, we can represent 
the contour integral by 




■OO 


Ci 






C 2 


'OO 


r x 2 -i-€ 


J — 2jn’ y^Res. 


The principal value of the integral is naturally defined to be the sum of all integrals 
having € in their limits. The contribution from the small semicircle C\ can be 
calculated by substituting z — x\ = €e l ° in the integral: 


f(z)dz 


__ f° f(xi-\-ee w )i€e w dO 

，Ci (z - x\){z - x 2 ) ^ A €e i0 (xi + €e w - x 2 ) 


f(x\) 


Xi -X2 


with a similar result for C 2 . Putting everything together, we get 


•00 


P 


fix) 




(x - X\)(x -X 2 ) X2 

If we include the case where both C\ and C 2 are in the LHP, we get 


P 


poo 

J —00 


fW 


(x - xi)(x - x 2 ) 


dx = iziiT 


f(X2) — / ⑹ 

X2 —X\ 


+ 2jti E Res, 


(10.15) 




T 


.3 EVALUATION OF DEFINITE INTEGRALS 2 的 



xi . x 2 


Figure 10.8 One of the four choices of contours for evaluating the principal value of the 
integral when there are two poles on the real axis. 


where the plus sign is for the case where C\ and C 2 are in the UHP and the minus 
sign for the case where both are in the LHP. We can also obtain the result for the 
case where the two singularities coincide by taking the limit X 2 - Then the 
RHS of the last equation becomes a derivative, and we obtain 


P r / ⑺ 

J-oo (X — xo ) 2 


dx ^ Mirfixo) -h 2iti y^Res. 


10.3.12. Example. An expression encountered in the study of Green’s functions or prop¬ 
agators (which we shall discuss later in the book) is 

Z* 00 e itx dx 

where k and t are real constants. We want to calculate the principal value of this integral. 
We use Equation (10.15) and note that for f > 0, we need to close the contour in the UHP, 
where there are no poles: 

f 00 e itx dx C°° e itx dx , e iict — e^ ikt sin kt 

P J -00 x 2 -k 2 ~ P J-oo (x 一 k、、x + k、 - & 2k ^ k 

When f < 0, we have to close the contour in the LHP, where again there are no poles: 

e itx dx e itx dx e ikt - e~ ikt sin^ 

P J-o Q =P Loo (x-k)(x + k) = ~ lJf ~ Tk ~ = 冗丁， 

The two results above can be combined into a single relation: 


f 00 e itx dx 
7-00 — fc 2 


sin 





290 10. CALCULUS OF RESIDUES 


10.2. Leth(z) be analytic and have a simple zero at z = zo, and let g(z) be analytic 
there. Let f(z) = g(z)/h(z) 9 and show that 


Res[/(zo)] 


8(zo) 

h^o) 


10.3. Find the residue of f(z) = 1/ cos z at each of its poles. 

10.4. Evaluate the integral / 0 °° dx/[(x 2 + l)(x 2 +4)] by closing the contour ⑻ 
in the UHP and (b) in the LHP. 

10.5. Evaluate the following integrals, in which a and 办 are nonzero real constants. 


cosxdx f 、 f ( 

I 

(^ + W+2) -⑻/ 

__ (k) r 

{x 2 +4x + 13) 2- W J 0 

)x cosxdx /m 、 f' 

^ 2 -2x+io* ⑷ y— 

x 2 dx r 

(;c 2 + 4) 2 (x 2 + 25). {q) Jo 


6 x 4 + 5 x 2 + 1 ' V Jo 

cosax 1 …广 

(/) /o 

2x 2 - 1 f a 

(0 i 

x simc dx 「 

x 2 - 2 x-{- 10 t ° Jq 
cos ax f° 

(r) / 0 


dx 

^TT 

3 dx 
(x 2 + l) 2 
x 2 dx 

\x 2 + a 2 ) 2 
x 2 + l 

-Ty - ~dX. 

x 2 +4 

dx 

dx 

(x 2 + 4) 2 * 


(c) f -^- dz. 
Jc z(z - 7t) 

(/) 


/ I 


⑷ £ 去 h. 

(s)<f ^ 

J C 之 


O') j) tmzdz^ 

㈣！盖? 


10.4 Problems 

10.1. Evaluate each of the following integrals, for all of which C is the circle 
kl = 3. / 


.J 

2ZP u 


\ NJ XJ. 

办 e ft IK « 

yf\ /V /IN /IV 




10.4 PROBLEMS 291 


Figure 10.9 The contour used in Problem 10.8, 



10.6. Evaluate each of the following integrals by turning it into a contour integral 
around a unit circle. 


lo 5 + 4sin0* 

广 dO 

,0 1+sin 2 〆 
f 2jr cos 2 30 
, 0 5-4cos2^ d0 '' 

f n cos 2 3(j) d(t> 
lo 1 —2a cos <p-\-a 2 
f n cos 2(p d(j> 

Iq l —2a cos (f> + a 2 


f 2n d0 

(b) I -- where a > 1_ 

Jo a 4- cosO 

r 2jt d9 

(d) Jo (a + w 印 wherea ^ >0 - 

(/) [ l-latU + a^ Where fl ^ ±L 

where a ^ il. 


where a ^ 士 1. 


(0 / tan(x + ia)dx where a € R. 

Jo 

(j) I e C0S{ ^ cos(n(/) — sin^) d(j) where w G Z. 

Jo 


10.7. Evaluate the integral / = / 二严 ^/(l+^)for0 < a < 1. Hint: Choose 
a closed (long) rectangle that encloses only one of the zeros of the denominator. 
Show that the contributions of the short sides of the rectangle are zero. 

10.8. Derive the integration formula /J 50 e~ x2 cos(2bx)dx = ^-e~ b2 where 

2 

^ 0 by integrating the function e~ z around the rectangular path shown in 
Figure 10.9. 


10.9. Use the result of Example 10.3.11 to show that 0 f (k) = 8(k), 


292 10. CALCULUS OF RESIDUES 


10.10. Find the principal values of the following integrals. 


⑷ 


(c) 


*C30 


100 


■00 


sin^: dx 


(x 2 +4)(x-l). 


X cosx 


x 2 -5x + 6 


dx. 


(b) 

(d) 


*00 


-00 


■00 


dx 


cosax 

1 +x 3 


1 — cos A ： 


where a > 0. 


x 2 


dx. 


10.11. Evaluate the following integrals. 


⑷ 

(c) 


X 


.2 


. - b 2 /sin ax\ T 

f 0 dx - 

sin ax 

h _ 2 +P ) 2 


(b) 


id) 


(/) 


smax 


dx. 


fo x(x 2 -hb 2 ) 
f°° cos 2ax — cos 2bx 

fo ^ 

sin 3 x dx 


dx. 


x 


3 


Additional Reading 

1. Dennery, P. and Krzywicki, A. Mathematics for Physicists, Harper and Row, 
1967. Includes a detailed discussion of complex analysis encompassing ap¬ 
plications of conformal mappings and the residue theorem to physical prob¬ 
lems. 

2. Mathews, J. and Walker, R. Mathematical Methods of Physics, 2nd ed., 
Benjamin, 1970. A “practical” guide to mathematical physics, including a 
long discussion of “how to evaluate integrals” and the use of the residue 
theorem. 




Complex Analysis: Advanced Topics 


The subject of complex analysis is an extremely rich and powerful area of math¬ 
ematics. We have already seen some of this richness and power in the previous 
chapter. This chapter concludes our discussion of complex analysis by introducing 
some other topics with varying degrees of importance. 


11.1 Meromorphic Functions 

Complex functions that have only simple poles as their singularities are numerous 
meromorphic in applications and are called meromorphic functions. In this section, we derive 
functions an important result for such functions. 

Assume that / ( 2 ) has simple poles at {zj }^ =1 , where N could be infinity. Then, 

if z ^ Zj for all 7 , the residue theorem yields 1 


^ ■砟 = /(z) + £>( 


m 


where C n is a circle containing the first« poles, and it is assumed that the poles are 
arranged in order of increasing absolute values. Since the poles of / are assumed 
to be simple, we have 


m 

■H — z 


羝(卜 


zj)fm 


Zj -z 


Res[/(?)] t=Zj . 


Zj-Z 


Note that the residue of /(^)/(^ - z) at f = z is simply /(z). 




294 11. COMPLEX ANALYSIS: ADVANCED TOPICS 


Mittag-Leffler 

expansion 


where rj is, by definition, the residue of /(^) at § = i: Substituting in the 
preceding equation gives 


f(z)^ 


1 


2ni 




n 


n 


z 


Taking the difference between this and the same equation evaluated at z = 0 
(assumed to be none of the poles), 2 we can write 


/ ⑵一 / ⑼- 


丄 / 代） 

2 ^^ Jc n ~ z) 


7=1 




If I / (§) I approaches a finite value as | 引一 > oo, the integral vanishes for an infinite 
circle (which includes all poles now), and we obtain what is called the Mittag- 
Leffler expansion of the meromorphic function /: 


/(z) = /(0) + f ： r,(-L- + i). 


( 11 . 1 ) 


Now we let g be an entire function with simple zeros. We claim that (a) 
(dg/dz)/g(z) is a meromorphic function that is bounded for all values of z, and 
(b) its residues are all unity. To see this, note that g is of the form 3 


g(z) = (z — z\){z — Z2) ••• (z - ZN)f(z), 

where zi ， ... ， zat 虹 e all the zeros of g, and / is an analytic function that does not 
vanish anywhere in the complex plane. It is now easy to see that 

8 f (z) — 1 f\z) 

8(z) ~ jr[Z-Zj f{z )' 


This expression has both properties (a) and (b) mentioned above. Furthermore, the 
last term is an entire function that is bounded for all C. Therefore, it must be a 
constant by Proposition 9,5.5. This derivation also verifies Equation (11.1)，which 
in the case at hand can be written as 






whose solution is readily found to be 


g(z) = g(0)e cz J~[ ^1 - ^ e zjz i where c = (’ 二 ; 


( 11 . 2 ) 


and it is assumed that Zy ^ 0 for all j. 


^This is not a restrictive assumption because we can always move our coordinate system so that the origin avoids all poles. 

3 One can i4 prove” this by factoring the simple zeros one by one, writing g{z) = (z - Z\)f\ {z) and noting that g(z2) — 0, 

with Z 2 t^Z{, implies that /i(z) = (z - 句 )/ 2 匕 ) ， etc. 





11.2 MULTIVALUED FUNCTIONS 295 




Figure 11.1 (a) The angle 6q changes by 2k as zo makes a complete circuit around C. 

(b) The angle 0o returns to its original value when 20 completes the circuit. 

11.2 Multivalued Functions 


The arbitrariness, up to a multiple of lit, of the angle 0 = arg(z)inz = re * 0 leads to 
functions that can take different values at the same point. Consider, for example, the 
function f(z) = y/z- Writing z in polar coordinates, we obtain f(z) = /(r, 0 )= 
(re。 1 ’ 2 = y/re ld ^ 2 . This shows that for the same z = (r, 0) = (r, 0 + 2n), we 
get two different values, f(r, 0) and /(r, 0 + 2n) = —f(r, 0), 

This maybe disturbing at first. After all, the definition of a function (mapping) 
ensures that for any point in the domain a unique image is obtained. Here two 
different images are obtained for the same 2 . Riemann found a cure for this complex 
“double vision” by introducing what is now called Riemann sheets. We will discuss 
these briefly below, but first let us take a closer look at a prototype of multivalued 
functions. Consider the natural log function, ln^. For z = re lG this is defined as 
In z = In r + ⑷ =In |z| + «• arg(z) where arg(z) is defined only to within a multiple 

of 2tt; that is, arg(z) = 0 + 2mt 9 for a = 0 , 土 1， 士2 , _ 

We can see the peculiar nature of the logarithmic function by considering a 
closed path around the point z = 0, as shown in Figure 11.1(a). Starting at zq, we 
move counterclockwise, noticing the constant increase in the angle until we 
reach the initial point in the 《 -plane. However, the angle is then Thus, the 

process of moving around the origin has changed the value of the log function by 
2ttL Thus, (lnzo)finai 一 (In zo)initiai = 2ni, Note that in this process does not 
change, because 

Uo)final = 2 = re lG e 2ni = re l9 = (zo)initial* 

branch point 11.2.1. Definition. A branch point of a function f : C C is a complex number 




296 11. COMPLEX ANALYSIS: ADVANCED TOPICS 


branch cut or simply 
“cut” 


zo ^ith the property that f(ro,Oo) ^ f(ro 9 Oo + 2n) for any closed curve C 
encircling zo* Here (ro, 沒 o) are the polar coordinates ofzo. 


Victor-Alexandre Puiseux (1820-1883) was the first to take up the subject of multivalued 
functions. In 1850 Puiseux published a celebrated paper on complex algebraic functions 
given by /(m ， z) = 0, / a polynomial in u and z. He first made clear the distinction between 
poles and branch points that Cauchy had barely perceived, and introduced the notion of an 
essential singular point, to which Weierstrass independently had called attention. Though 
Cauchy, in the 1846 paper, did consider the variation of simple multivalued functions along 
paths that enclosed branch points，Puiseux clarified this subject too, 

Puiseux also showed that the development of a function of z about a branch point z = a 
must involve fractional powers of z — He then improved on Cauchy’s theorem on the 
expansion of a function in a Maclaurin series. By his significant investigations of many¬ 
valued functions and their branch points in the complex plane, and by his initial work on 
integrals of such functions, Puiseux brought Cauchy’s pioneering work in function theory 
to the end of what might be called the first stage. The difficulties in the theory of multiple- 
valued functions and integrals of such functions were still to be overcome. Cauchy did write 
other papers on the integrals of multiplevalued functions in which he attempted to follow 
up on Puiseux’s work; and though he introduced the notion of branch cuts {lignes d^arrit), 
he was still confused about the distinction between poles and branch points. This subject 
of algebraic functions and their integrals was to be pursued by Riemann. 

Puiseux was a keen mountaineer and was the first to scale the Alpine peak that is now 
named after him. 


Thus，z = 0 is a branch point of the logarithmic function. Studying the behavior 
of ln(l/ 之 ） = —\rz around z = 0 will reveal that the point “at infinity” is also 
a branch point of In z. Figure lLl(b) shows that any other point of the complex 
plane, such as z\ cannot be a branch point because does not change when C f is 
traversed completely. 

11.2.1 Riemann Surfaces 

The idea of a Riemann surface begins with the removal of all points that lie on the 
line (or any other curve) joining two branch points. For In z this means the removal 
of all points lying on a curve that starts atz = 0 and extends all the way to infinity. 
Such a curve is called a branch cut, or simply a cut. 

Let us concentrate on In z and take the cut to be along the negative half of the 
Teal axis. Let us also define the functions 


fn(z) = fn(r ， ❻、 

=lnr + i(0 - {-2rm) for —n <0 <tt\ r > 0; n = 0, ±1, • • • ， 



11.2 MULTIVALUED FUNCTIONS 297 


Riemann surfaces 
and sheets 


so f n (z) takes on the same values for —it < 6 < n that lnz takes in the range 
(In — 1)tt < 0 < (2n + 1)tt. We have replaced the multivalued logarithmic 
function by a series of different functions that are analytic in the cut z-plane. 

This process of cutting the z-plane and then defining a sequence of functions 
eliminates the contradiction caused by the existence of branch points, since we are 
no longer allowed to completely encircle a branch point. A complete circulation 
involves crossing the cut, which, in turn, violates the domain of definition of f n (z ). 

We have made good progress. We have replaced the (nonanalytic) multivalued 
function Inz with a series of analytic (in their domain of definition) functions 
/„(z). However, there is a problem left: f n (z) has a discontinuity at the cut. In 
fact, just above the cut f n (r, jt — 6) = lnr —6 + 2nn) with € > 0, and just 
below it f n (r, —n -f €) = In r + / (—丌 -f 6 + 2nn) 9 so that 

lun [/ n (r, tt — e) — f n (r, -n -\-e)] - 

To cure this we make the observation that the value of f n {z) just above the cut 
is the same as the value of / n +i (z) just below the cut. This suggests the following 
geometrical construction, due to Riemann: Superpose an infinite series of cut 
complex planes one on top of the other, each plane corresponding to a different 
value of n. The adjacent planes are connected along the cut such that the upper lip 
of the cut in the (n — l)th plane is connected to the lower lip of the cut in the «th 
plane. All planes contain the two branch points. That is, the branch points appear 
as “hinges” at which all the planes are joined. With this geometrical construction, 
if we cross the cut, we end up on a different plane adjacent to the previous one 
(Figure 11.2). 

The geometric surface thus constructed is called a Riemann surface; each 
plane is called a Riemann sheet and is denoted by Rj, for j = 0, 士1，土2, •. • • 
A single-valued function defined on a Riemann sheet is called a branch of the 
original multivalued function. 

We have achieved the following: From a multivalued function we have con¬ 
structed a sequence of single-valued functions, each defined in a single complex 
plane; from this sequence of functions we have constructed a single complex func¬ 
tion defined on a single Riemann surface. Thus, the logarithmic function is analytic 
throughout the Riemann surface except at the branch points, which are simply the 
function’s singular points. 

It is now easy to see the geometrical significance of branch points. A complete 
cycle around a branch point takes us to another Riemann sheet, where the function 
takes on a different form. On the other hand, a complete cycle around an ordinary 
point either never crosses the cut, or if it does, it will cross it back to the original 
sheet. 

Let us now briefly consider two of the more common multivalued functions 
and their Riemann surfaces. 

11.2.2« Example. The function f(z) = z^ n 

The only branch points for the function f(z) = are z = 0 and the point at infinity. 


298 11. COMPLEX ANALYSIS: ADVANCED TOPICS 


evaluation of 
integrals involving 
cuts 



Figure 11.2 A few sheets of the Riemann surface of the logarithmic function. The path 
C encircling the origin O ends up on the lower sheet. 

Defining fjc(z) = for /： = 0, 1,— 1 andO <9 <2n and following 

the same procedure as for the logarithmic function, we see that there must be n Riemann 
sheets, labeled Rq, Ri ，..., R n -i? i 11 the Riemann surface. The lower edge of R n -\ is 
pasted to the upper edge of Rq along the cut, which is taken to be along the positive real 
axis. The Riemann surface for a = 2 is shown in Figure 11.3. 

It is clear that for any noninteger value of a the function f(z) = z a has a branch point 
at 之 = 0 and another at the point at infinity. For irrational a the number of Riemann sheets 
is infinite. @ 

11.2.3. Example. The function f(z) = (z 2 - 1) 1/2 

The branch points for the function f(z) = (z^ — l)" 2 are at zi = +1 and Z2 — —1 (see 
Figure 11.4). Writing z — 1 = r\e 101 and z + 1 = 厂 2 亡吻 ， we have 

f(z) = (r!，) 1/2 (r 2 ，) 1/2 = 咐满 A 

The cut is along the real axis from z = —l to z = +1. There are two Riemann sheets in 
the Riemann surface. Clearly, only cycles of 2it involving one branch point will cross the 
cut and therefore end up on a different sheet Any closed curve that has both zi and 22 as 
interior points will remain entirely on the original sheet. B 

The notion of branch cuts can be used to evaluate certain integrals that do not 
fit into the three categories discussed in Chapter 10. The basic idea is to circumvent 
the cut by constructing a contour that is infinitesimally close to the cut and circles 
around branch points. 


11.2 MULTIVALUED FUNCTIONS 2 的 



Figure 11.3 The Riemann surface for f{z) = z 1 / 2 . 



Figure 11.4 The cut for the function f{z) = (z 2 — l) 1 / 2 is from z\ to Z 2 . Paths that 
circle only one of the points cross the cut and end up on the other sheet. 


11«2«4« Example. To evaluate the integral / = /q°° x a dx/(x^ + 1) for |or| < 1， consider 

the complex integral V = z a dz/{z} + 1) where C is as shown in Figure 11.5 and the 
cut is taken along the positive real axis. To evaluate the contribution from Cr and C r , we 
let p stand for either r or /?. Then we have 

r f (pe W ) a . i9 ^ . [ l7t 泸 +V( a +W _ 

Jc p (pe id ) 2 + 1 Jo p 2 e 2lB 1 

It is clear that since |o?| < 1, 0 as /(? -> 0 or ^ oo. 



300 11. COMPLEX ANALYSIS: ADVANCED TOPICS 



Figure 11.5 The contour for the evaluation of the integrals of Examples 11.2.4 and 11.2.5. 


The contributions from L\ and L 2 do not cancel one another because the value of 
the function changes above and below the cut. To evaluate these two integrals we have to 
choose a branch of the function. Let us choose that branch on which z a = \z\ a e ia ^ for 
0 < 0 < 2jt. Along L\, 9 0 or = x a , and along Lj, 0 ^ 2n or z a = (xe 2ni ) a . 

Thus, 




dz 


L 


00 ， 


rO x a e 2nia 

x 2 + l dX ^J ooixe 2ni ) 2 + i dx 


00 / 




dx. 


(11-3) 


The LHS of this equation can be obtained using the residue theorem. There are two simple 
poles, at z = +i and z = —i with residues Res[/(/)] = {e in ^) a j2i and Res[/(—i)]= 
-(^ 3?r / 2 ) a /2i. Thus, 


dz = 2ni 


^ian/2 J3ajt/2\ 


Tcz 2 + l - 2/ 2i 

Combining this with Equation (11.3), we obtain 


J = 7t(e ia ^ 2 - e i3ajr ^ 2 y 


x a t 7r(e ia7T / 2 - e i3a7r ^ 2 ) it ait 
I - dx = - ^― ： - = — sec —' 

Jo x 2 + l 1 一 々 a 2 2 


If we had chosen a different branch of the function, both the LHS and the RHS of Equation 
(11.3) would have been different, but the final result would still have been the same. M 



11.2 MULTIVALUED FUNCTIONS 301 



11«2«5. Example. Here is another integral involving a branch cut: 


_oo 厂 《 


『0 X 


dx 


for 0 < a < 1. 


To evaluate this integral we use the zeroth branch of the function and the contour of the 
previous example (Figure 11.5). Thus, writing z = pe l& , we have 


2ni Res [/(—!)] 


「c 2 + 


dz 


z 


—a 


fS-iFTT 


dz 


+ 


0 (pe 2in . 


J2ijr 


r oo 


p e 2in _|_ i 


dp + 


z 


-a 


r c r z + \ 


dz. 


(11.4) 


The contributions from both circles vanish by the same argument used in the previous 
example. On the other hand, Res[/(—1)] = (—l) _a . For the branch we are using, 一 1 == 
e in , Thus, Res[/(—!)] = e~ ia7Z . The RHS of Equation (11.4) yields 


*oo n -a roo n 

P dp - e~^ a f P 


dp = (l- e~ 2i7ta )L 


JO P + 1 — 「 _ Jo P + 1 

It follows from (11.4) that (1 — e~ 2l7ta )I = 2Trie~ ina , or 

7 Z 


.00 


*0 x + 


dx 


sman 


for 0 < a < 1. 


■ 


11.2.6. Example. Let us evaluate I = Jq° hix dx/(x^ a 2 ) with a > 0. We choose the 
zeroth branch of the logarithmic function, in which 一丌 < 沒 < 丌 ， and use the contour of 
Figure 11.6. For L\, z = (note that p > 0), and for L 2 , z=^ p. Thus, we have 


2jti Res[f(ia)] 


\nz 


hC Z 2 +a 2 


dz 


f 00 \np J 

+ 乂 + 


]n(pe l7r ) 

00 (p^O 2 十 a 2 
lnz 


e l7t dp + 


L 


lnz 


C € Z 2 + a 2 


dz 


L 


z 2 + a 2 


dz. 


(11.5) 


302 11, COMPLEX AWALYSIS: ADVANCED TOPICS 


equal on a piece, 
equal all over 


where z = ia is the only singularity 一 a simple pole—~in the UHP. Now we note that 


ln(pe i7r ) 


e in dp 


l 


oo 


In p + i’TT 


•00 


dp 


lnp 


•oo 


.2 


.cc(pe ^) 2 + a^ ^~h P 2 +# 叩—人 P 2 +a 

The contributions from the circles tend to zero. On the other hand, 

\nz In (/a) 1 


dp + in 


dp 


p 2 +fl 2 


ReS [/ _ = ^( 卜⑹ 7 -咖 + ⑻ 


2ia 


2ia 




Substituting the last two results in Equation (11.5)，we obtain 
7 T / ,7z\ ^ \np dp 

a( hla + l 2) =2 j € + 戊 乂 ^ 2 - 

It can also easily be shown that dp/(p 2 + a 2 ) = 7t/(2a). Thus, in the limit e 

get I = — Ina. The sign of a is irrelevant because it appears as a square in the integral. 
2a 

Thus, we can write 


0 , we 


■oo 


\nx 


『0 


X 2 十 fl 2 


dx 




S 刚， 


a^O. 


園 


11.3 Analytic Continuation 

Analytic functions have certain unique properties, some of which we have already 
noted. For instance, the Cauchy integral formula gives the value of an analytic 
function inside a simple closed contour once its value on the contour is known. 
We have also seen that we can deform the contours of integration as long as we do 
not encounter any singularities of the function. 

Combining these two properties and assuming that / : C C is analytic 
within a region S C C，we can ask the following question: Is it possible to extend 
f beyond 5? We shall see in this section that the answer is yes in many cases of 
interest. 4 First consider 

11.3.1. Theorem. Let f\, f 2 : C ^ C be analytic in a region S. If f\ — fj in a 
neighborhood of a point ze S, or for a segment of a curve in 5, then f\ — hfor 
all zeS. 

Proof. Let g = fi - f 2 ,andU = {z e S \ g(z) = 0}. Then C/ is a subset of S that 
includes the neighborhood of z (or the line segment) in which f\ = / 2 . If t/ is the 
entire region S, we are done. Otherwise, U has a boundary beyond which g(z) i=- 0. 
Since all points within the boundary satisfy g(z) = 0, and since g is continuous 
(more than that, it is analytic) on S, g must vanish also on the boundary. But the 
boundary points are not isolated: Any small circle around any one of them includes 
points of U as well as points outside U. Thus, g must vanish on a neighborhood 
of any boundary point, implying that g vanishes for some points outside U. This 
contradicts our assumption. Thus, U must include the entire region S. □ 


4 Provided that S is not discrete (countable). (See [Lang 85, p. 91]-) 


■ 3 ANALYTIC CONTINUATION 


A consequence of this theorem is the following corollary. 

11.3.2. Corollary. The behavior of a junction that is analytic in a region S cCis 
completely determined by its behavior in a (small) neighborhood of an arbitraij 
point in that region. 

analytic continuation This process of determining the behavior of an analytic function outside the 

region in which it was originally defined is called analytic continuation. Although 
there are infinitely many ways of analytically continuing beyond regions of defi¬ 
nition, the values of all functions obtained as a result of diverse continuations are 
the same at any given point. This follows from Theorem 11.3.L 

Let /i, /2 : C Cbe analytic in regions Si and S 2 , respectively. Suppose that 
f\ and /2 have different functional forms in their respective regions of analyticity. 
If there is an overlap between and ^2 and if /1 = /2 within that overlap, then 
the (unique) analytic continuation of f\ into S 2 must be / 2 , and vice versa. In fact, 
we may regard f\ and /2 as a single function / : C — C such that 


f(z) = 


Mz) 

f 2 (z) 


when z E S\, 
when z e 52 . 



Clearly, / is analytic for the combined region S = S\U S 2 - We then say that f\ 
and fi are analytic continuations of one another. 

11,3.3. Example. Let us consider the function f\(z) = z n t which is analytic 
for \z\ < 1. We have seen that it converges to 1/(1 - z) for \z\ < 1. Thus, we have 
/l(z) = 1/(1 — z) when |z| < 1 , and f\ is not defined for |z| > 1 . 

Now let us consider a second function, / 2 b) ― 52^= 。（鲁 ) 朴 ^ (z + §) n , which 
converges for |z 4* 5 I < To see what it converges to, we note that f 2 (z )= 

id+§)r_Th 叫 


f2(z) 


5(z + |) 


when k + |l < 5 


We observe that although f\(z) and / 2 U) have different series representations in the two 
overlapping regions (see Figure 11.7), they represent the same function, /(z) = 1 /(l-z). 
We can therefore write 


尸 ㈠ = when |z| < 1 , 

Z when k + || < f, 

and f\ and are analytic continuations of one another. In fact, f(z) = 1/(1 — z) is the 
analytic continuation of both f\ and /2 for all of C except z = 1. Figure 11.7 shows S (, 
the region of definition of f (, for i = 1 , 2 . ■ 

11*3.4. Example. The function f\ (z) = e~ zt dt exists only if Re ⑵ > 0 , in which 
case f\ (z) = 1/z. Its region of definition S\ is shown in Figure 11.8 and is simply the right 
half-plane. 




304 11. COMPLEX ANALYSIS: ADVANCED TOPICS 




11.3 ANALYTIC CONTINUATION 305 



⑻ (b) 


Figure 11.9 (a) Regions 5i and ^2 separated by the boundary B and the contour C. (b) 

The contour C splits up into Q and C^. 

Thus, we have 

I _ |/l(0 when z e Si ， 
z I/ 2 U) when z e ^ 2 - 

The two functions are analytic continuations of one another, and f(z) = l/zis the analytic 
continuation of both f\ and /2 for all z G C except z = 0. M 

1U.1 The Schwarz Reflection Principle 

A result that is useful in some physical applications is referred to as a dispersion 
relation. To derive such a relation we need to know the behavior of analytic func¬ 
tions on either side of the real axis. This is found using the Schwarz reflection 
principle, for which we need the following result. 

11.3.5. Proposition. Let fi be analytic throughout Si, where i = 1,2. Let B be 
the boundary between S\ and S 2 (Figure 11.9) and assume that f\ and f 2 are con¬ 
tinuous on B and coincide there. Then the two functions are analytic continuations 
of one another and together they define a (unique) function 

/(之） = Mz) when SiUB 9 
I/ 2 ⑻ when z ^ S 2 V B 9 

which is analytic throughout the entire region U ^2 U B. 


306 11. COMPLEX ANALYSIS: ADVANCED TOPICS 


Proof. The proof consists in showing that the function integrates to zero along any 
closed curve in U 52 U B. Once this is done, one can use Morera’s theorem to 
conclude analyticity. The case when the closed curve is entirely in either S\ or S 2 
is trivial. When the curve is partially in S\ and partially in ^2 the proof becomes 
only slightly more complicated, because one has to split up the contour C into C\ 
and C 2 of Figure 11.9(b). The details are left as an exercise. □ 

Schwarz reflection 11.3.6. Theorem. (Schwarz reflection principle) Let f be a function that is ana- 

principle lytic in a region S that has a segment of the real axis as part of its boundary B • If 

f(z) is real whenever z is real，then the analytic continuation g of f into S* (the 
mirror image ofS with respect to the real axis) exists and is given by 

g(z) = where z e S*. 

Proof. First, we show that g is analytic in S*. Let 

f(z)^u(x,y) + iv(x 9 y) 9 g(z) = U(x 9 y)+ iV(x,y). 

Then f(z*) = f(x, -y) = u(x, -y) + iv(x, -y) and g(z) - /*U*) imply that 
U(x, y) — u(x 9 —y) and V(x, y) = —y). Therefore, 

dU du Sv _ dv _ dV 

Sx dx 办 3( — >0 dy ’ 

8U du dv dV 

dy dy 3x dx 

These are the Cauchy-Riemann conditions for g(z). Thus, g is analytic. 

Next, we note that /(x, 0) = g(x, 0)，implying that / and g agree on the real 
axis. Proposition 11.3.5 then implies that / and g are analytic continuations of 
one another. □ 

It follows from this theorem that there exists an analytic function h such that 

h(z) = \ nz) Whenz€ ^ 
g{z) when z G 5*. 

We note that h{f) = g(z*) = f*(z) = h*(z). 

11.3.2 Dispersion Relations 

Let / be analytic throughout the complex plane except at a cut along the real axis 
extending from xo to infinity. Fora point 之 noton the 叉 -axis，the Cauchy integral 
formula gives f(z) ― (2?n) 一 1 / c /(§) d^/(^ — z) where C is the contour shown 
in Figure 11.10. We assume that / drops to zero fast enough that the contribution 



11.3 ANALYTIC CONTINUATION 307 


dispersion relation 



Figure 11.10 The contour used for dispersion relations. 


from the large circle tends to zero. The reader may show that the contribution from 
the small half-circle around jcq also vanishes. Then 


/⑵ 


2ni 


2iti 


L 人。名 § 一 Z 

r f(x+u) 


00—l€ 


m 


dH 


iyj：o 


x — z + ie 


啦 - f . 卜 z 

JXQ—l€ 5 ‘ ■ 

_ fix - i€) 


dx 


dx 


x -z 


Since z is not on the real axis, we can ignore the ie terms in the denominators, 


so that f(z) = (2th’)— 1 f^[f(x + ie) — f(x — i€)]dx/(x — z). The Schwarz 
reflection principle in the form f*(z) = f(z*) can now be used to yield 


f(x + i€) - f(x - U) = f(x + ie) - f*(x + ie) = 2i Im[/Cx + ie)]. 


The final result is 


This is one form of a dispersion relation. It expresses the value of a function at 
any point of the cut complex plane in terms of an integral of the imaginary part of 
the function on the upper edge of the cut. 

When there are no residues in the UHP, we can obtain other forms of dispersion 
relations by equating the real and imaginary parts of Equation (10.11). The result 


? (z) = 


i r 

^ JxQ 


Im[/(x + ⑷] 


dx. 


x 


z 




30& 11. COMPLEX ANALYSIS: ADVANCED TOPICS 


Hilbert transform 


dispersion relation 
with one subtraction 


optical theorem 


is 


Re[/ ⑽] 


士丄 


P 


n 


干 〆 


■00 

-oo 

•00 

■oo 


Im[/ ⑷] 

X — Xq 

Re[/OQ] 

X — XQ 


dx ， 


dx. 


(11.6) 


where the upper (lower) sign corresponds to placing the small semicircle around 
xo in the Utff (LHP). The real and imaginary parts of /, as related by Equation 
(11.6), are sometimes sometimes said to be the Hilbert transform of one another. 

In some applications the imaginary part of / is an odd function of its argument. 
Then the first equation in (11.6) can be written as 


Re[/( 邓 )] 




OO 


xJm[f(x)] 


dx. 


x 2 — Xq 

To arrive at dispersion relations, the following condition must hold: 
lim R\f(Re w )\^0, 

R—oo 

where R is the radius of the large semicircle in the UHP (or LHP). If / does not 
satisfy this prerequisite, it is still possible to obtain a dispersion relation called 
a dispersion relation with one subtraction. This can be done by introducing 
an extra factor of x in the denominator of the integrand. We start with Equation 
(10.15), confining ourselves to the UHP and assuming that there are no poles there, 
so that the sum over residues is dropped: 


/ ⑻ — f(x\) 

X2~X\ 


J _ p 「 /㈨ 

i 沉 J-oo (x - x\)(x - X2) 


dx. 


The reader may check that by equating the real and imaginary parts on both sides, 
letting xi = 0 and JC 2 = xo, and changing x to —x in the first half of the interval 
of integration, we obtain 


= + i 

Xq Xq It 


•00 


P 


mn-x)] 


>00 


dx P 


Im[/(jc)] 


-o Z(X+ ： to) … _ 九 x(x — xo) 
For the case where Im[/(—x)] = — Im[/(x)], this equation yields 


dx 


2x^ 

Re[/(x 0 )] = Re[/ ⑼] + 


00 Im[/ ⑷] 

丌 * Jo x(x 2 -X^) 


dx. 


(11.7) 


11*3.7. Example. In optics, it has been shown that the imaginary part of the forward¬ 
scattering light amplitude with frequency co is related, by the so-called optical theorem, to 
the total cross section for the absorption of light of that frequency: 


a) 


Im [/ ⑹卜~咖棘 


11.4 THE GAMMA AND BETA FUNCTIONS 309 


Kramers-Kronig 

relation 


gamma function 
defined 


Substituting this in Equation (11,7) yields 

Re[/( 叫 )] =Re[/(0)] + f°° 々 t (气 dco. (11.8) 

2n z Jq co 2 - wg 

Thus, the real part of the (coherent) forward scattering of light, that is, the real part of the 
index ofrefraction ，can be computed from Equation (11.8) by either measuring or calculating 
ofct ⑽， the simpler quantity describing the absorption of light in the medium. Equation 
(11.8) is the original Kramers-Kronig relation. 圃 


11.4 The Gamma and Beta Functions 


We have already encountered the gamma function. In this section, we derive some 
useful relations involving the gamma function and the closely related beta function. 
The gamma function is a generalization of the factorial function — which is defined 
only for positive integers ― to the system of complex numbers. By differentiating 
the integral 1(a) = / 0 °° e~ at dt - = 1/a with respect to a repeatedly and setting 
a = 1 at the end, we get / 0 °° ^e^dt = n!. This fact motivates the generalization 

r(z) = / t^e^dt for Re ⑵ > 0 ， (11.9) 

Jq 

where r is called the gamma (or factorial) function. It is also called Euler’s 
integral of the second kind. It is clear from its definition that 

r(n + l)=rt! (11.10) 

if n is a positive integer. The restriction Re(z) > 0 assures the convergence of the 
integral. 

An immediate consequence of Equation (11.9) is obtained by integrating it by 
parts: 


r(z + l) = z r(z). 


(1U1) 


This also leads to Equation (11.10) by iteration. 

Another consequence is the analyticity of r(z). Differentiating Equation 
(11.11) with respect to z, we obtain 


wr(z + i) 

dz 


= r(z) +z 


dT{z) 

dz 



Thus, dV(z)/dz exists and is finite if and only if dF(z + l)/dz is finite (recall 
that z 7 ^ 0). The procedure of showing the latter is outlined in Problem 11.16. 
Therefore, F(z) is analytic whenever T(z + 1) is. To see the singularities of T(z), 
we note that 


r(z + n) = z(z 4- l)(z + 2) "-(z + « - l)r(z), 


310 11. COMPLEX ANALYSIS: ADVANCED TOPICS 


Euler-Mascheroni 

constant 


beta function defined 


or 


r ⑻ 


r(z+n) 


z(z + l)(z + 2) • •. (Z + n - 1) 


( 11 . 12 ) 


The numerator is analytic as long as Re(z + n) > 0, or Re(z) > —«• Thus, for 
Re(z) > —n, the singularities of T (z) are the poles at 之 = 0, — 1 ， 一 2, … ， 一《 +1. 
Since n is arbitrary, we conclude that 


11.4.1. Box. r(^) is analytic at all z e C except at z — 0,-1 ，一 2 , … ， 
where r ⑵ has simple poles. 


A useful result is obtained by setting z = 5 in Equation (11.9): 

r(i) = y/n, (11.13) 

This can be obtained by making the substitution m = V7 in the integral. 

We can derive an expression for the logarithmic derivative of the gamma func¬ 
tion that involves an infinite series. To do so, we use Equation (11.2) noting that 
l/T{z + 1) is an entire function with simple zeros at {—Equation (11.2) 
gives 


r(z + 1) 


e 


yz 


fiM) 


e 


-z/k 


where y is a constant to be determined. Using Equation (11.11)，we obtain 


1 

ro) 


k=i 


(11.14) 


To determine y, let z = 1 in Equation (11.14) and evaluate the resulting product 
numerically. The result is y = 0.57721566 ■ _ ■ ， the so-called Euler-Mascheroni 
constant. 

Differentiating the logarithm of both sides of Equation (11.14 )， we obtain 




2 


y + 



z-\-k 


). 


(11.15) 


Other properties of the gamma function are derivable from the results presented 
here. Those derivations are left as problems. The beta function, or Euler’s integral 
of the first kind ， is defined for complex numbers a and 办 as follows: 

B(a, b) = f t a ~ l {\ — t) b ~ l dt where Re(a) ， Re ⑻ > 0. (11.16) 

Jo 



1.4 THE GAMMA AND BETA FUNCTIONS 311 


By changing t to \/t, we can also write 


B(a, b) 


- a _ b (t - l) b - l dt. 


(11.17) 


gamma function and 
beta function are 
related 


Since 0 < ^ < 1 in Equation (11.16), we can define Obyt^ sin 2 ^ This gives 


B(a, b) — 2 I sin 2fl_1 0 cos 2b ~ l 0 dG. 

Jo 


(11.18) 


This relation can be used to establish a connection between the gamma and beta 
functions. We note that 

/*oo 1*00 

r ⑷ =I t a ~ l e~ l dt = 2 1 x 2a ~ l e~ x dx, 

JO Jo 

where in the last step we changed the variable to x = y/t. Multiply T(a) by r(b) 
and express the resulting double integral in terms of polar coordinates to obtain 
r(a)r ( 办） =r(a + b)B(a, b), or 


B(a,b) = B(b, a) 


r ⑷ r ⑻ 

T{a+b) 


(11.19) 


Let us now establish the following useful relation: 


rwr(i-z) 


SHITTY 


( 11 . 20 ) 


With a = z and b = l — z, and using u = tanO, Equations (11.18) and (11.19) 
give 

roo u 2z-l 

r(z)r(l -z) = B(z, 1 — z) = 2 / -z —— -du for 0 < Re ⑵ < 1. 

Jo V + 1 

Using the result obtained in Example 11.2.4, we immediately get Equation (11.20), 
valid for 0 < Re(z) < 1. By analytic continuation we then generalize Equation 
(11.20) to values of z for which both sides are analytic. 

11.4.2. Example. As an illustration of the use of Equation (11.20), let us show that r ⑵ 
can also be written as 


r(z) ^ 2m Jc 


( 11 . 21 ) 


where C is tlie contour shown in Figure 11.11. From Equations (11.9) and (11.20) it follows 
that 


^ = = ^ r e-^dr 

r(z) it % Jo 


_ e -i7tZ rOO e - 




312 11. COMPLEX ANALYSIS: ADVANCED TOPICS 


Im(t) 



Re ⑴ 


Figure 11-11 The contour C used in evaluating the reciprocal gamma function. 


The contour integral of Equation (11.21) can be evaluated by noting that above the real 
axis, t = re in = —r, below it / = re~ in = —r ， and, as the reader may check, that the 
contribution from the small circle at the origin is zero; so 





{-dr) + 



e 

(re~ i7C ) z 


(-dr) 


,—inz 


.oo e -r 


dr+e l7TZ / 

r z Jo 


oo e -r 




dr. 


Comparison with the last equation above yields the desired result. 



Another useful relation can be obtained by combining Equations (11.11) and 
(11.20): r ⑵ r(l -z) = r( 2 )(-z)r(—0 = 7t/ sin^z. Thus, 


r(z)T(-z )= 


丌 

zsin 丌 z 


( 11 . 22 ) 


Once we know F(x) for positive values of real 文 ， we can use Equation (11.22) 
to find F(x) for x < 0. Thus, for instance, r(^) = gives r (一士 ) = —2y/n, 

Equation (11.22) also shows that the gamma function has simple poles wherever 
^ is a negative integer. 

11.5 Method of Steepest Descent 


It is shown in statistical mechanics ([Hill 87, pp. 150-152]) that the partition 
function, which generates all the thermodynamical quantities, can be written as 
a contour integral. Debye found a very elegant technique of approximating this 
contour integral, which we investigate in this section. Consider the integral 

1(a) = J e af ^ z) g(z) dz (11.23) 

where |aI is large and / and g are analytic in some region of C containing the 
contour C. Since this integral occurs frequently in physical applications, it would 


11.5 METHOD OF STEEPEST DESCENT 313 


be helpful if we could find a general approximation for it that is applicable for 
all / and g. The fact that \a\ is large will be of great help. By redefining f(z), if 
necessary, we can assume that a = \a\e l 哪⑹ is real and positive [absorb e l 哗 ⑻ 
into the function f{z) if need be]. 

The exponent of the integrand can be written as 

(xf(z) = otu{x,y) + iav(x,y). 


Since of is large and positive, we expect the exponential to be the largest at the 
maximum of u(x, y). Thus, if we deform the contour so that it passes through a 
point at which u(x, y) is maximum, the contribution to the integral may come 
mostly from the neighborhood of zo- This opens up the possibility of expanding 
the exponent about zo and keeping the lowest terms in the expansion, which is 
what we are after. There is one catch, however. Because of the largeness of a, the 
imaginary part of af in the exponent will oscillate violently as u(j:, y) changes 
even by a small amount. This oscillation can make the contribution of the real 
part of / (zo) negligibly small and render the whole procedure useless. Thus, we 
want to tame the variation of exp[/u(x, y)] by making v(x, y) vary as slowly as 
possible. A necessary condition is for the derivative of v to vanish at 20 • This and 
the fact that the real part is to have a maximum at zo lead to 


du ,dv _ df 

1 ""I"" I ' 1 

Sx Sx dz zo 



(11.24) 


However, we do not stop here but demand that the imaginary part of / be constant 
along the deformed contour: Im[/(z)] = ltn[f(zo)] or v(x, y) = u(x。， yo). 

Equation (11.24) and the Cauchy-Riemann conditions imply that du/dx = 
0 = du/dy atzo- Thus, it might appear that zo is a maximum (or minimum) of the 
surface described by the function m(u). This is not true: For the surface to have 
a maximum (minimum), both second derivatives, a 2 M/3x 2 and d 2 u/dy 2 , must be 
negative (positive). But that is impossible because m(jc, y)is harmonic 一 the sum of 
these two derivatives is zero. Recall that a point at which the derivatives vanish but 
that is neither a maximum hot a minimum is called a saddle point. That is why the 
procedure described below is sometimes called the saddle point approximation. 


We are interested in values of z close to zo- So let us expand f(z) in a Taylor 
series about zo, use Equation (11.24)，and keep terms only up to the second, to 
obtain 


f(z) = /(zo) + \{z- zo) 2 f f (zo)- 
Let us assume that /〃(zo) # 0, and define 
z-zo = r\e l ° l and 
and substitute in the above expansion to obtain 
f(z)~ /(zo) = 


(11.25) 

(11.26) 

(11.27) 


314 11. COMPLEX ANALYSIS: ADVANCED TOPICS 


method of steepest 
descent 



Figure 11.12 A segment of the contour Co in the vicinity of 功 . The lines mentioned in 
the text are small segments of the contour Cq centered at ^o- 

or 

Re[/U) - f(zo)] = rf 厂 2 cos(20i + 办)， 

Im[/(z) - /(^o)] = r\r 2 sin(20i + 办). （11.28) 

The constancy of Im[/(z)] implies that sin(2^i + 62) = 0, or 20\ + = nit. 

Thus, for B\ = -G2/2 + mt/2 where w = 0,1, 2, 3, the imaginary part of / is 
constant. The angle O2 is determined by the second equation in (11.26). Once we 
determine n, the path of saddle point integration will be specified. 

To get insight into this specification, consider z — zo = (一内’ 2 + 腑 / 2 )，and 

eliminate r\ from its real and imaginary parts to obtain 

>-yo = [tan y)] (x - jc 0 ). 

This is the equation of a line passing through zo = (a ： o, yo) and making an angle 
of 0\ = (wtt — 92)/2 with the real axis. For n = 0, 2 we get one line, and for 
n = 1， 3 we get another that is perpendicular to the first (see Figure 11.12). It is 
to be emphasized that along both these lines the imaginary part of / (z) remains 
constant. To choose the correct line, we need to look at the real part of the function. 
Also note that these “lines” are small segments of (or tangents to) the deformed 
contour at zq. 

We are looking for directions along which Re(/) goes through a relative max¬ 
imum at zo* In fact, we are after a path on which the function decreases maximally. 
This occurs when Re[/(^)] — Re[/(zo)] take the largest negative value. Equation 
(11.28) determines such a path: It is that path on which cos(20i + O 2 ) — —1，or 
when n = 1, 3. There is only one such path in the region of interest, and the pro¬ 
cedure is uniquely determined . 5 Because the descent from the maximum value at 


11.5 METHOD OF STEEPEST DESCENT 315 


之 o is maximum along such a path, this procedure is called the method of steepest 
descent. 

Now that we have determined the contour, let us approximate the integral. 
Substituting -\-0 2 = 7t, 3tt in Equation (11.27), we get 


f(z) - f(zo) = -r\r 2 = -t 1 = \(z - Zo) 2 f\zo). 

Using this in Equation (11.23) yields 

Ha) ^ f e a ^)^] g{z)dz = e anzo) f e - at2 g(z) dz, 
Jc 0 Jcq 


(11.29) 


(11.30) 


where Co is the deformed contour passing through zo. 

To proceed, we need to solve for z in terms of t. From Equation (11.29) we 
have 


(z - zq) j 


f /f (zo) 


Therefore, \z — zo\ = \t\/^/r 2 , or z — Zo = (| 糾 /^7^)〆 01 ， by the first equation of 
(11.26). Let us agree that for r > 0, the point z on the contour will move in the 
direction that makes an angle of 0 < < jt, and that ^ < 0 corresponds to the 

opposite direction. This convention removes the remaining ambiguity of the angle 
B\, and gives 


z = zo + —=^e l6i , Q <0\ <ix. 

Using the Taylor expansion of g(z) about zo, we can write 


8(z) dz 


OO t n 

n/2 t 
n=0 r 2 


广 ~ # 




id substituting this in Equation (11.30) yields 


OO J (71-1-1)01 pc 

=eaf(z %w^ 8(n){Z0) ^ 


(11.31) 


e^^g^izo) dt 


e~ at2 t n dt. 


(11.32) 


5 The angle 6\ is still ambiguous by 7r, because n can be 1 or 3. However, by a suitable sign convention described below, we 
can remove this ambiguity. 


316 11. COMPLEX ANALYSIS: ADVANCED TOPICS 


asymptotic 
expansion of 1(a) 


The extension of the integral limits to infinity does not alter the result significantly 
because a is assumed large and positive. The integral in the sum is zero for odd 
When «is even, we make the substitution w = at 2 and show that e~ at t n dt = 
[(«+1)/2]. With n = 2A:, and using r 2 = I f f (zo) |/2, the sum becomes 


+ 如 I" 2 . 


(11.33) 


This is called the asymptotic expansion of 1(a). In most applications, only the 
first term of the above series is retained, giving 


1(a) ^ ，伽 )(11.34) 

11.5.1. Example. Let us approximate the integral 

poo 

1(a) = T(a + 1) = I e~ z z <x dz i 

Jo 

where a is a positive real number. First, we must rewrite the integral in the form of Equation 
(11.23). We can do this by noting that z a = e ahlz . Thus, we have 

/*0O POO 

1(a) = / e abl ^ z dz^ / e a ^ z ~ zM dz, 

Jo Jo 

and we identify f(z) =\nz — z/ot and g(z) = 1. The saddle point is found from /’ ⑵ = 0 
or — oi. Furthermore, from 


Stirling 

approximation 


jfizo) = i (-^) = => 、=沉 

and 20i + % = 兀， 3 丌 ， as well as the condition 0 < < n, we conclude that 9\ = 0. 

Substitution in Equation (11.34) yields 


r (a + i)^o)^_^ 


=V2^ e «0n«-l) = V+l/2, 


(11.35) 


called the Stirling approximation. 


B9 


11.5.2. Example. The Hankel function of the first kind is defined as 


in Jc z 叶 1 

where C is the contour shown in Figure 11.13. We want to find the asymptotic expansion 
of this function, choosing the branch of the function in which —丌 <9 <tt. 

We identify f(z) = l/z) and v 一 1 . Next, the stationary points of / 

are calculated: 


u 

d^ = 2 


+ 


2 ? 


0 =>• 


功 = 土匕 



Figure 11.13 The contour for the evaluation of the Hankel function of the first kind. 


The contour of integration suggests the saddle point zo = +i-The second derivative evalu¬ 
ated at the saddle point gives f f/ (zo) = — 1/Zq = -i = e~ i7t ^ 2 , or % = -jt/2. This, and 
the convention 0 < < n, force us to choose 0\ — 3jt/4. Substituting this in Equation 

(11.34) and noting that /(/) = i and \f f/ (zo)\ = 1, we obtain 


i l \a) = ^ l- e ai J—e i 3 n l A r v - x 


where we have used 广 v—1 = e -r (v+l)jr/2. 



： (a—v^/2—7r/4) 


Although Equation (11.34) is adequate for most applications, we shall have 
occasion to demand a better approximation. One may try to keep higher-order terms 
of Equation (11.33), but that infinite sum is in reality inconsistent. The reason is 
that in the product g(z) dz, we kept only the first power of / in the expansion of z. 
To restore consistency, let us expand z(t) as well. Suppose 

OO 00 

z-zo= ^2 bm1：m ^ ^ + ^)b m ^\t m dt. 


so that 


00 t n OO 

g ⑵ '^/2~ ein0l S in \zo) + l)b m+ it m dt 

n=0 m=0 

^ pindi 

=[~fi + ^mA-\g in) {z^)t m ^ n dt, 

Now introduce l = m-\-n and note that the summation over n goes up to L This 
gives 

00 l inQi oo 

s(z) = ~i/2 . ^ n 尽⑻ (zo)= ^ait l dt. 

/ =0 n—0 ^2 / =0 



318 11. COMPLEX ANALYSIS: ADVANCED TOPICS 


Substituting this in Equation (11.30) and changing the contour integration into the 
integral from —oo to oo as before yields 


oo 


1(a) ^ e anz(}) ^ a 2 kO(~ k ~ l/2 r (A + 士）， 

灸 =o 
^inOy 

Cl2k = ^ n/2 ( 及 -«+ ^)hk-n+lg in \zo)- 


d /2 nl 


(11.36) 


The only thing left to do is to evaluate b m . We shall not give a general formula 
for these coefficients. Instead, we shall calculate the first three of them. This should 
reveal to the reader the general method of approximating them to any order. We 
have already calculated b\ in Equation (11.31). To calculate keep the next- 
highest term in the expansion of both ^ and £ 2 . Thus write 

z-zo — ht + b 2 t 2 , t 2 = — zo) 2 - ^f f ，， (zo)(z - zo) 3 . 

2 o 


Now substitute the first equation in the second and equate the coefficients of equal 
powers of t on both sides. The second power of t gives nothing new: It merely 
reaffirms the value of hi. The coefficient of the third power of / is —b\b 2 f /f (zo) — 


Setting this equal to zero gives 


(11.37) 


where we substituted for fei from Equation (11.31) and used 20i + 办 =jr ■ 

To calculate keep one more term in the expansion of both z and t 2 to obtain 

z-zo^ht + b%t 2 + b^t 3 


and 


t 2 = ~f{zQ)(z-zo) 2 - ^f f/, (zo)(z-zo) 3 - - zo) 4 - 


Once again substitute the first equation in the second and equate the coefficients 
of equal powers of t on both sides. The second and third powers of t give nothing 
new. Setting the coefficient of the fourth power of t equal to zero yields 


b = , 3 [ 5[/"’fa})] 2 /( iv ) 

3_ 1 mr(zo)] 2 24/ 〃 (z 0 ) 

_ y/2e 3i ^ [5[/ w fe 0 )] 2 /( iv ) 

= 12 |/〃 (如 )p/2 1 3[r(zo)} 2 ~ f\zo) 


(11.38) 


11.6 PROBLEMS 319 



1L6 Problems 

11.1. Derive Equation (11,2) from its logarithmic derivative. 

11.2. Show that the point at infinity is not a branch point for f(z) = (z 2 — l) 1 / 2 . 

11.3. Find the following integrals, for which 0 ^ a e M. 


'OO 


⑷ 


Inx 


dx. (b) 


L 


00 


\nx 


dx . 


JO (P + fl2)2 —— …九 (^2+^)2^ 

11.4. Use the contour in Figure 11.14 to evaluate the following integrals. 


⑷ 



sin ax 
sinh^: 


dx 



jccosca , 


11.5. Show that Jq f(sm6) d6 =2 f^ 2 f{^m0)d9 for an arbitrary function / 
defined in the interval [— 1 ， +1] ■ 

11.6. Find the principal value of the integral x sinx dx/(x 2 —x^) and evaluate 


x srn^: t 

- -- ——- dx 

(A： — J：0 士 i€){x + A：0 士 ⑷ 

for the four possible choices of signs. 

11.7. Use analytic continuation, the analyticity of the exponential, hyperbolic, 
and trigonometric functions, and the analogous identities for real z to prove the 
following identities. 



(a) e z = coshz + sinhz. 
(c) sin2z = 2sinzcosz. 


(b) cosh 2 z — sinh 2 z = l. 




320 11. COMPLEX ANALYSIS: ADVANCED TOPICS 


11.8. Show that the function l/z 2 represents the analytic continuation into the 

domain C - {0} (all the complex plane minus the origin) of the function defined 
by where k + II < L 

11.9. Find the analytic continuation into C _{/，_/} (all the complex plane except 
i and —/) of f(z) = / 0 °° e~ zt smtdt where Re(z) > 0. 

11.10. Expand f(z) = ESo f (defined in its circle of convergence) in a TUylor 
series about z = a. For what values of a does this expansion permit the function 
f(z) to be continued analytically? 


11.11. The two power series 




oo 沙 


00 


and 


rt= 


n 


f2(z) = i7t + J2^l) n 


(z - 2) n 


n 


have no common domain of convergence. Show that they are nevertheless analytic 
continuations of one another. 


11.12. Prove that the functions defined by the two series 


l+az + a 2 z 2 H - 


and 


1 (1 - d)z (1 - a) 2 z 2 

T^z~ (1 - z) 2 + d-z) 3 


are analytic continuations of one another. 

11.13. Show that the function f\ (z) = l/(z 2 + 1)，where z ^ is the analytic 

continuation into C — [i t —i] of the function f 2 (z) = 【广之 2 ' where 

kl < 1. ~ 

11.14. Find the analytic continuation into C - {0} of the function 

疒 OO 

f(z) = I te~ zt dt where Refe) > 0. 

Jo 

11.1S* Show that the integral in Equation (11.9) converges. Hint: First show that 
|r(z + 1)| < / 0 °° t x e -t dt where x = Re(z). Now show that 

poo p 1 poo 

I ^e^dt < I ^e^dt + f t n e~ l dt for some integer n > 0 

Jo Jo Jo " 

and conclude that T(z) is finite. 


11,16* Show that dT(z + \)/dz exists and is finite by establishing the following: 

(a) |ln/| < t-\-\/tioxt > 0 .Hint:For/ > 1 ， showthatf — ln£isamonotonically 
increasing function. For t < 1, make the substitution t = l/s. 

(b) Use the result from part (a) in the integral for dT(z + l)/dz to show that 
\dT(z 4- l)/dz\ is finite. Hint: Differentiate inside the integral. 


11.6 PROBLEMS 321 


11.17. Derive Equation (11.11) from Equation (11.9). 

11.18. Show that r(^) = and that 


(2k - 1)!! = (2k - l)(2k -3)-.-5-3 


2 k 




r 


2 


11.19. Show that T(z) = f^lhid/OY^dt withRefe) > 0. 
11,20* Derive the identity e xU dx = r[(o: + l)/a]. 


11.21. Consider the function f(z) = (1 + 之 ) a . 

⑻ Show that d n f/dz n \ z =o = r(or + 1)/ r(a — 71 + 1)，and use it to derive the 
relation 


(1 +z) 




汐， 


where 


(：) 


a! 


r(a + i) 


n\{a — n)\ n!r(a 一 n + 1) _ 


(b) Show that for general complex numbers a and 办 we can formally write 


(a + b) a = 



(c) Show that if a is a positive integer m，the series in part (b) truncates at n = m_ 

11.22. Prove that the residue of V(z) at z = —A: is = (—l) k /k\. Hint: Use 
Equation (11.12) 

11.23. Derive the following relation for z = x + 尔 


00 


ir ⑵ i = 

k=0 


1 + 


r 


(x-\-k) 2 j 


1/2 


11.24. Using the definition of B(a, b). Equation (11.16)，show that B(a t b)= 
B(b ， a). 


11.25. Integrate Equation (11.21) by parts and derive Equation (11.11). 

11.26. For positive integers n, show that r(^ — «)T(^ + n) =： (— l) rt 丌 . 

11.27. Show that 


(a) B(a, b) = B(a + 1, fc) + B(a, b H- 1). 

⑻•，⑷ )=( •，扑 

(c) B(a, b)B(a + c) = B(b ， c)B(a f b 4 - c). 



322 11. COMPLEX ANALYSIS: ADVANCED TOPICS 


Imz 




O 



Rez 




Figure 11.15 The contour for the evaluation of the Hankel function of the second kind. 


11.28. Verify that /^(1+ t) a (l - tfdt = 2 a+M B(a -h 1,6+1). 

11.29. Show that the volume of the solid formed by the surface z = x a y b , the 
xy-, yz- 9 and xz-planes, and the plane parallel to the z-axis and going through the 
points (0, : yo) and (xq, 0 ) is 


n 1 

o, b ^ 


+ 1，办 + 1). 


11.30. Derive this relation: 

f 00 sinh 0 ^: , 1 /a + 1 b — a\ 

L ^r x dx= 2 B y-^—) 


where — 1 < a < b. 


Hint: Let t = tm\h 2 x m Equation (11.16). 

11.31. The Hankel function of the second kind is defined as 


纪 2) ⑷ 4// 


(a/2Xz — l/z) 


dz 


z 


v+l 


where C is the contour shown in Figure 11.15. Find the asymptotic expansion of 
this function. 


11.32. Find the asymptotic dependence of the modified Bessel function of the first 
kind, defined as 


Ivipt) 


2k i 


e 


{aj2){z+\/z) 


r C 


dz 


where C starts at —oo, approaches the origin and circles it, and goes back to —oo. 
Thus the negative real axis is excluded from the domain of analyticity. 





11.6 PROBLEMS 323 


11.33. Find the asymptotic dependence of the modified Bessel function of the 
second kind: 


K v (a) 



-(a/2){z+l/z) 


dz 


z 


v+l ， 


where C starts at oo, approaches the origin and circles it, and goes back to oo. 
Thus the positive real axis is excluded from the domain of anaiyticity. 


Additional Reading 

1. Dennery, P. and Krzy wicki, A. Mathematics for Physicists, Harper and Row, 
1967. 

2. Lang, S. Complex Analysis, 2nd ed„ Springer-Verlag, 1985. Contains a very 
lucid discussion of analytic continuation. 




Part IV_ 

Differential Equations 




Separation of Variables in Spherical 
Coordinates 


Poisson’s equation 


Laplace's equation 


heat equation 


The laws of physics are almost exclusively written in the form of differential 
equations (DEs). In (point) particle mechanics there is only one independent vari¬ 
able, leading to ordinary differential equations (ODEs). In other areas of physics 
in which extended objects such as fields are studied, variations with respect to po¬ 
sition are also important. Partial derivatives with respect to coordinate variables 
show up in the differential equations, which are therefore called partial differen¬ 
tial equations (PDEs). We list the most common PDEs of mathematical physics 
in the following. 

12.1 PDEs of Mathematical Physics 

In electrostatics, where time-independent scalar fields such as potentials and vector 
fields such as electrostatic fields are studied, the law is described by Poisson’s 
equation, 

▽ 2 ^>(r) = — 47rp(r). (12.1) 

In vacuum, where p(r) = 0, Equation (12.1) reduces to Laplace’s equation, 

V 2 <l>(r) = 0. (12.2) 

Many electrostatic problems involve conductors held at constant potentials and 
situated in vacuum. In the space between such conducting surfaces, the electrostatic 
potential obeys Equation (12.2). 

The most simplified version of the heat equation is 

a 2 V 2 7(r), (12.3) 

at 



328 12. SEPARATION OF VARIABLES IN SPHERICAL COORDINATES 


where r is the temperature and a is a constant characterizing the medium in which 
heat is flowing. 

One of the most frequently recurring PDEs encountered in mathematical 
wave equation physics is the wave equation. 


& 冷 



(12.4) 


This equation (or its simplification to lower dimensions) is applied to the vibration 
of strings and drums; the propagation of sound in gases, solids, and liquids; the 
propagation of disturbances in plasmas; and the propagation of electromagnetic 
waves. 

The Schrodinger equation, describing nonrelativistic quantum phenomena, 

Schrodinger equation is 




(12.5) 


Klein-Gordon 

equation 


where mis the mass of a subatomic particle, is Planck’s constant (divided by 2 jt )， 
V is the potential energy of the particle, and 丨屮 (r, £)1 2 is the probability density 
of finding the particle at r at time t. 

A relativistic generalization of the Schrodinger equation for a free particle 
of mass m is the Klein-Gordon equation ， which, in terms of the natural units 
(h = 1 = c )，reduces to 


▽ 2 沴一 m^(j) 


3 2 (j) 

"a?' 


( 12 . 6 ) 


Equations (12.3-12.6) have partial derivatives with respect to time. As a first 
step toward solving these PDEs and as an introduction to similar techniques used 
in the solution of PDEs not involving time, 1 let us separate the time variable. We 
will denote the functions in all four equations by the generic symbol 屮 (r ， /), The 
basic idea is to separate the r and t dependence into factors: ^(r, 0 = R(r)T(t). 
time Is separated This factorization permits us to separate the two operations of space differentiation 

from space and time differentiation. Let L stand for all spatial derivative operators and write 
all the relevant equations either as = d^/dt or as = d 2 ^/dt 2 . With this 
notation and the above separation, we have 


L(RT) == T(LR )=： 


RdT jdt ， 
Rd 2 T/dt 2 . 


1 See [Hass 99] for a thorough discussion of separation in Cartesian and cylindrical coordinates. Chapter 19 of this book also 
contains examples of solutions to some second-order linear DEs resulting from such separation. 



12.1 PDES OF MATHEMATICAL PHYSICS 329 


Dividing both sides by we obtain 
1 dT 


r lr 


T dt ' 
i d 2 T 


(12.7) 


Now comes the crucial step in the process of separation of variables. The LHS 
of Equation (12.7) is a function of position alone ，and the RHS is a function of 
time alone. Since r and t are independent variables, the only way that (12.7) can 
hold is for both sides to be constant, say a: 


— LR = a LR = aR 

R 


and 2 


1 dT 

fit 


a 


dT_ 

dt 


aT or 


1 d 2 T 

a 


a 


d 2 T 


dt 


2 


aT, 


We have reduced the original time-dependent PDE to an ODE, 


dT 

dt 


aT 


or 


d 2 T 

IF 


aT, 


( 12 . 8 ) 


and a PDE involving only the position variables, (L — =0. The most general 

form ofL—a arising from Equations (12.3-12.6) isL—a = V 2 + /(r). Therefore, 
Equations (12.3-12.6) are equivalent to (12.8), and 


V 2 * + f(r)R = 0. 


(12.9) 


To include Poisson’s equation, we replace the zero on the RHSbyg(r) = —4jrp(r), 
obtaining V 2 R + f(r)R = g(r). With the exception of Poisson’s equation (an 
inhomogeneous PDE), in all the foregoing equations the term on the RHS is zero. 3 
We will restrict ourselves to this so-called homogeneous case and rewrite (12.9) 


as 


▽ 2 少 (r) + /(r) 屯 (r) = 0. 


( 12 . 10 ) 


Depending on the geometry of the problem, Equation (12.10) is further separated 
into ODEs each involving a single coordinate of a suitable coordinate system. We 
shall see examples of all major coordinate systems (Cartesian, cylindrical, and 


2 In most cases, a is chosen to be real. In the case of the Schrodinger equation, it is more convenient to choose a to be purely 
imaginary so that the i in the definition of Lean be compensated. In all cases, the precise nature of a is determined by boundary 
conditions. 

^Techniques for solving inhomogeneous PDEs are discussed in Chapters 21 and 22.. 


330 12. SEPARATION OF VARIABLES IN SPHERICAL COORDINATES 


spherical) in Chapter 19. For the rest of this chapter, we shall concentrate on some 
general aspects of the spherical coordinates. 


Jean Le Rond d’Alembert (1717-1783) was the illegitimate son of a famous sa¬ 
lon hostess of eighteenth-century Paris and a cavalry officer. Abandoned by his mother, 
d’Alembert was raised by a foster family and later educated by the arrangement of his 
father at a nearby church-sponsored school, in which he received instruction in the classics 
and above-average instruction in mathematics. After studying law and medicine, he finally 
chose to pursue a career in mathematics. In the 1740s he joined the ranks of the philosophes' 
a growing group of deistic and materialistic thinkers and writers who actively questioned 
the social and intellectual standards of the day. He traveled little (he left France only once, 
to visit the court of Frederick the Great), preferring instead the company of his friends in 
the salons, among whom he was well known for his wit and laughter. 

D’Alembert turned his mathematical and philosophical 
talents to many of the outstanding scientific problems of the 
day, with mixed success. Perhaps his most famous scien¬ 
tific work, entitled Traiti de dynamique, shows his appre¬ 
ciation that a revolution was taking place in the science of 
mechanics — the formalization of the principles stated by New¬ 
ton into a rigorous mathematical framework. The philoso¬ 
phy to which d'Alembert subscribed, however, refused to ac¬ 
knowledge the primacy of a concept as unclear and arbitrary 
as “force,” introducing a certain awkwardness to his treatment 
and perhaps causing him to overlook the important principle 
of conservation of energy. Later, d’Alembert produced a treatise on fluid mechanics (the 
priority of which is still debated by historians), a paper dealing with vibrating strings (in 
which the wave equation makes its first appearance in physics), and a skillful treatment 
of celestial mechanics. D’Alefnbert is also credited with use of the first partial differential 
equation as well as the first solution to such an equation using separation of variables. 
(One should be careful interpreting ‘^first ”： many of d’Alembert’s predecessors and con- 
temporaries gave similar, though less satisfactory, treatments of these milestones.) Perhaps 
his most well-known contribution to mathematics (at least among students) is the ratio lest 
for the convergence of infinite series. 

Much of the work for which d* Alembert is remembered occurred outside mathemat¬ 
ical physics. He was chosen as the science editor of the Encyclopedie, and his lengthy 
Discours Preliminaire in that volume is considered one of the defining documents of the 
Enlightenment. Other works included writings on law, religion, and music. 

Since d ， Alembert’s final years were not especially happy ones, perhaps this account 
of his life should end with a glimpse at the humanity his philosophy often gave his work. 
Like many of his contemporaries, he considered the problem of calculating the relative 
risk associated with the new practice of smallpox inoculation, which in rare cases caused 
the disease it was designed to prevent. Although not very successful in the mathematical 
sense, he was careful to point out that the probability of accidental infection, however slight 
or elegantly derived, would be small consolation to a father whose child died from the 











12.2 SEPARATION OF THE ANGULAR PARTOFTHE LAPLACIAN 331 


inoculation. It is greatly to his credit that d’Alembert did not believe such considerations 
irrelevant to the problem. 


12o2 Separation of the Angular Part of the Laplacian 


angular momentum 
operator 

commutation 
relations between 
components of 
angular momentum 
operator 


With Cartesian and cylindrical variables, the boundary conditions are important 
in determining the nature of the solutions of the ODE obtained from the PDE. In 
almost all applications, however, the angular part of the spherical variables can 
be separated and studied very generally. This is because the angular part of the 
Laplacian in the spherical coordinate system is closely related to the operation 
of rotation and the angular momentum, which are independent of any particular 
situation. 

The separation of the angular part in spherical coordinates can be done in a 
fashion exactly analogous to the separation of time by writing ^ as a product 
of three functions, each depending on only one of the variables. However, we 
will follow an approach that is used in quantum mechanical treatments of angular 
momentum. This approach, which is based on the operator algebra of Chapter 2 
and is extremely powerful and elegant, gives solutions for the angular partin closed 
form. 

Define the vector operator p as p = —iV so that its yth Cartesian component 
is pj = -iS/Sxj, for j - 1,2,3. In quantum mechanics p (multiplied by h) is 
the momentum operator. It is easy to verify that 4 [Xj, p^] = iSjk and [xj,xk ]= 

0 = [Pj > P/：]* — —. 

We can also define the angular momentum operator as L = r x p. This 
is expressed in components as Li = (r x p)^ = eijkXjPk for i = 1,2,3, where 
Einstein’s summation convention (summing over repeated indices) is utilized . 5 
Using the commutation relations above, we obtain 


[Ly, L*] = i^jkiU- 


We will see shortly that L can be written solely in terms of the angles ^ and 
(p. Moreover, there is one factor of p in the definition of L, so if we square L, we 
will get two factors of p, and a Laplacian may emerge in the expression for L - L. 
In this manner, we maybe able to write V 2 in terms of L 2 , which depends only on 


4These operators act on the space of functions possessing enough “nice” properties ^ to render the space suitable. The operator 

x ； simply multiplies functions, while p ； differentiates them. . ,, 

^Itis assumed that the reader is familiar with vector algebra using indices and such objects as Sfj and 旬 /：；. For an introductory 

treatment, sufficient for our present discussion, see [Hass 99]. A more advanced treatment of these objects (tensors) can be found 

in Part VII of this book. 





332 12. SEPARATION OF VARIABLES IN SPHERICAL COORDINATES 


Laplacian separated 
into angular and 
radial parts 


angles. Let us try this: 


3 

P = L . L = > : LjL/ = jkXjPk^imn x mPn = jk^imn x j Pn 

/ =1 

= i^jm^kn — 8jn 谷 km)^j Pk x mPn = x jPk x jPk — Pk x kPj* 

We need to write this expression in such a way that factors with the same index are 
next to each other, to give a dot product. We must also try, when possible, to keep 
thep factors to the right so that they can operate on functions without intervention 
from the x factors. We do this using the commutation relations between the :c’s 
and the p’s: 

L 2 = xj(xjp k - iS kJ )p k - (p k xj + iS kj )x k pj 

=XjXjp k p k - ixjpj - p k x k XjPj - ixjpj 

= x jXjPkPk — 2ixjPj - (x k p k - iS kk )xjPj. 

Recalling that 8kk = Yll=i ― 3 and XjXj = Yl 3 j=i x j x j =rr = r 2 etc., we 

can write L 2 = r 2 p ■ p + /? ■ p — (r • p)(r • p), which, if we make the substitution 
p = —/ V, yields 

V 2 = -r - 2 L 2 + r _2 (r • V)(r. V) + r _2 r • V. 

Letting both sides acton the function ^(r, 6, 炉)， we get 

V 2 ^ = -^L 2 vp+ • V)(r. V)* + • V 屯 . (12.11) 


But we note that r ■ V = re r - V = rd/dr. We thus get the final form of V 2 ^ in 
spherical coordinates: 


V 2 ^ = 





1 9 



1 神 

+ r 87- 


(12.12) 


It is important to note that Equation (12.11) is a general relation that holds in 
all coordinate systems. Although all the manipulations leading to it were done In 
Cartesian coordinates, since it is written in vector notation, there is no indication 
in the final form that it was derived using specific coordinates. 

Equation (12.12) is the spherical version of (12.11) and is the version we shall 
use. We will first make the simplifying assumption that in Equation (12.10), the 
master equation, /(r) is a function of r only. Equation (12.10) then becomes 


Assuming, for the time being, that L 2 depends only on 9 and and separating 
屯 into a product of two functions, ^(r,0,<p) = ^), we can rewrite this 



12.2 SEPARATION OFTHE ANGULAR PARTOFTHE LAPLACIAN 333 


equation as 


^L 2 (RY) + 


r dr 


[r 士 (/?r)] + i^-(RY) + /(r)/?7 = 0_ 


Dividing by RY and multiplying by r 2 yields 

1 r d / dR\ r dR 2 〜、 ^ 

一 7 l (y) + ^( r ^) + ^ + r /(r) = 0 = 


—a 


-\-a 


or 


L 2 Y(0 9 (p)=aY(0,(p) 


(12.13) 


and 


d 2 R 2dR 
dr 2 + r dr + 


[C 


/? = 0. 


(12.14) 


We will concentrate on the angular part, Equation (12.13), leaving the radial part to 
the general discussion of ODEs. The rest of this subsection will focus on showing 
that Li = L x , L 2 ^ \- y9 and L3 = L z are independent of r. 

Since Li is an operator, we can study its action on an arbitrary function /. 
Thus, Ljf = 一 UijkXjVkf 三 —UijkXjdffdxk. We can express the Cartesian 
xj in terms of r, 0, and and use the chain rule to express Sf/Sxk in terms of 
spherical coordinates. This will give us Li / expressed in terms of r, G, and <p. It 
will then emerge that r is absent in the final expression. 

Let us start with x = r sin ^ cos ^ 0 , y = r sin0 sin^, z = r cos 0 ， and their 
inverses, r = (x 2 -H j 2 + z 2 ) 1 ’ 2 , cosO = z/r, tm<p = y/x, and express the 
Cartesian derivatives in terms of spherical coordinates using the chain rule. The 
first such derivative is 


9/ _ a/9r d^d0_ 3 / 9 ^ 

dx dr dx ^ dG dx~^~ 3(p dx 


(12.15) 


The derivative of one coordinate system with respect to the other can be easily 
calculated. For example, dr/dx = x/r = sin 0 cos 炉 ， and differentiating both 
sides of the equation cos 0 = z/r, we obtain 


sin 汐 


z^r/dx zx 


dx 


r* 


cos 0 sin 0 cos SO cos 0 cos (p 

r Sx r 


Finally, differentiating both sides of tan 炉 = y/x with respect to ^ yields d<p/dx 
— sin 炉 /(r sin0). Using these expressions in Equation (12.15), we get 

3 / . 3 / cos 0 cos <p 3 / sin^) df 

—=sin 0 cos — I — 

ox ar 




r 


r sin 沒 dtp 


334 12. SEPARATION OF VARIABLES IN SPHERICAL COORDINATES 


In exactly the same way, we obtain 


V 


.^ . 9/ cos 9 sin (p df cos cp df 


cosG 


df sin9 df 


dr 


r 


We can now calculate by letting it act on an arbitrary function and expressing 
all Cartesian coordinates and derivatives in terms of spherical coordinates. The 
result is 


p V df / . S 


4 - cotO cos 




f. 


Cartesian 
components of 
angular momentum 
operator expressed 
in spherical 
coordinates 


or 

L ^； = i (sijKp^- + cote . 

V a6 3(p/ 

Analogous arguments yield 


L y = i (— cos(p-^ + cot 0 sin 炉^-)， 


L z = -i 


d_ 

dip 


(12.16) 


(12.17) 


angular momentum It is left as a problem for the reader to show that by adding the squares of the 

squared as components of the angular momentum operator, one obtains 

differential operator ^ 

in 沒 and% 13/. a \ 1 d 2 

L = — : ~ — — (sin 0 —) - -z - (12.18) 

sm 汐 d6 V W ) sin 2 0 

Substitution in Equation (12.12) yields the familiar expression for the Laplacian 
in spherical coordinates. 


12.3 Construction of Eigenvalues of L 2 

Now that we have L 2 in terms of 0 and 史 ， we could substitute in Equation (12.13), 
separate the 0 and tp dependence, and solve the corresponding ODEs. However, 
there is a much more elegant way of solving this problem algebraically, because 
Equation (12.13) is simply an eigenvalue equation for L 2 . In this section, we will 
find the eigenvalues of L 2 . The next section will evaluate the eigenvectors of L 2 
Let us consider L 2 as an abstract operator and write (12.13) as 

L 2 \Y)=a\Y), 

where \Y) is an abstract vector whose (0, 沪 ) th component can be calculated later. 
Since L 2 is a differential operator, it does not have a (finite-dimensional) matrix 




12.3 CONSTRUCTION OF EIGENVALUES OF L 2 335 


representation. Thus, the determinantal procedure for calculating eigenvalues and 
eigenfunctions will not work here，and we have to find another way. 

The equation above specifies an eigenvalue, a, and an eigenvector, |7). There 
maybe more than one |7> corresponding to the same a. To distinguish among these 
so-called degenerate eigenvectors ， we choose a second operator, say L 3 e {L/}that 
commutes with L 2 . This allows us to select a basis in which both L 2 and L 3 are 
diagonal, or, equivalently, a basis whose vectors are simultaneous eigenvectors 
of both L 2 and L 3 . This is possible by Theorem 4.4.15 and the fact that both L 2 
and L 3 are hermitian operators in the space of square-integrable functions. (The 
proof is left as a problem.) In general, we would want to continue adding operators 
until we obtained a maximum set of commuting operators which could label the 
eigenvectors. In this case, L 2 and L 3 exhaust the set . 6 Using the more common 
subscripts x, y, and z instead of 1 ， 2, 3 and attaching labels to the eigenvectors, 
we have 

L 2 \Y ayP ) = a\Y a ,p}, L z \Y a ^)=p\Y a ^}. (12.19) 

The hermiticity of L 2 and L z implies the reality of a and p. Next we need to 
determine the possible values for g: and p. 

Define two new operators L-j_ = L ^； + iL y and L_ = L ^： — iL y . It is then easily 
verified that 

[L 2 , L 土 ] = 0 ， [L„ L ± ] = ±L 士， [L +， L_] = 2L Z . (12.20) 

The first equation implies that L 士 are invariant operators when acting in the sub¬ 
space corresponding to the eigenvalue a; that is, L 土 \Y a ^} are eigenvectors of L 2 
with the same eigenvalue a: 

L 2 (L 土 \Y a ^)) = L 士 (L 2 \Y a ^))=aL ± \Y a ^). 

The second equation in (12.20) yields 

h(L+ — (L:L+) \ Y a ^) = (L+L z + L+) \ Y a ^) 

=L+k I la,)3 > + L+ lYa,^) ~ \ Ya^) + L+ \Ya,^) 

=0 + 1)L+ \Y a ^). 

This indicates that L + \Y a ^) has one more unit of the L z eigenvalue than \Y a ^} 
does. In other words, L + raises the eigenvalue of L z by one unit. That is why L+ 
is called a raising operator. Similarly, L_ is called a lowering operator because 
L z (L_|y^» = (^-l)L_|^). t . 

We can summarize the above discussion as 

L 土 \Y ai p}) = C± \Y a ^±\), 


6 We could just as well have chosen L 2 and any other component as our maximal set. However, L 2 and L3 is the universally 
accepted choice. 


angular momentum 
raising and lowering 
operators 



336 12. SEPARATION OF VARIABLES IN SPHERICAL COORDINATES 


where C± are constants to be determined by a suitable normalization. 

There are restrictions on (and relations between) a and p. First note that as L 2 
is a sum of squares of hermitian operators, it must be a positive operator; that is, 
(a IL 2 \a) > 0 for all \a). In particular, 

0< (Y af ^\L 2 [7 M ) = ot{Y^\Y a ^) = 求， 〆 . 

Therefore, a >0. Next, one can readily show that 

L 2 = L+L— + - L z = L_L+ -\-L 2 z +L z . (12.21) 

Sandwiching both sides of the first equality between \ Y a ^) and (Y ai ^\ yields 

(Y a ,fi I L 2 \Y a ^) = (Y a ^\ L+L 一 \ Y a ^) + {Y a ^\Ll \ Y a ^) - (Y a> ^\ L z \Y a ^) , 

with an analogous expression involving L-L+. Using the fact that L + ^ (L —) 1 *， 
we get 

«ll^|| 2 = (Y a ^\ L+L 一 |r a ^>+^ 2 ||F M || 2 ^ m a ，〆 
= (Y a ^\ L-L+ I^>+J8 2 ||7 M || 2 + 齡，釗 2 
= 11— |y M ) ll 2 + 沪 ||1 ^|| 2 干 PWYaM 2 (12.22) 

Because of the positivity of norms, this yields a > 芦 2 — 卢 and a > Adding 

these two inequalities gives 2a t 2^ 2 —^/a < p < y/cL It follows that the 

values of p are bounded. That is, there exist a maximum p, denoted by 卢 +, and 
a minimum p, denoted by p —，beyond which there are no more values of 0. This 
can happen only if 

L+ \ Y ay p + ) = 0, L_ \Y a ^_) = 0, 

because if L± \ Y a ,^ ± ) are not zero, then they must have values of ^ corresponding 
to 0 土士 1， which are not allowed. 

Using for p in Equation (12.22) yields 

(a-^-^)\\Y a ^\\ 2 = 0. 

By definition |7 a ^ + ) ^ 0 (otherwise 0+ _ 1 would be the maximum). Thus, we 
obtain or = ^4 - $+. An analogous procedure using for ^ yields a = 把 —fi 

We solve these two equations for and j0_: 

^ + = i(-i±Vi+4a), p 一 = |(i 士 vTT^). 

Since 办 + > 卢一 and Vl + 4o? > 1, we must choose 
= !(-l + Vl + 4a )= -卢一 



12.3 CONSTRUCTION OF EIGENVALUES OF L 2 337 


Starting with | Y a> ^ + ) ， we can apply L_ to it repeatedly. In each step we decrease 
the value of p by one unit. There must be a limit to the number of vectors obtained 
in this way, because P has a minimum. Therefore, there must exist a nonnegative 
integer k such that 

(L 一 , +1 \Y a ^ + ) = L_(Li |7 M+ ))-0. 

Thus, L^_ \ Y a ^ + ) must be proportional to \ Y a ^_). In particular, since L* \Y a ^ + ) 
has a p value equal to ^ — k, we have 爲一 =_ k. Now, using 卢一 = —j3+ 
(derived above) yields the important result 

^ = ^ = j for ken f 

ora = j(j + 1), since a = This result is important enough to be stated 

as a theorem. 

eigenvalues of L 2 and 12.3.1. Theorem. The eigenvectors ofi?，denoted by \Yj m ) t satisfy the eigenvalue 
L z given relations 

= m \^jm) i 

where j is a positive integer or half-integer，and m can take a value in the set 
{- 7 , -J+ 1 ,j - 1 , j} of 2] + 1 numbers. 

Let us briefly consider the normalization of the eigenvectors. We already know 
that the \Yj m ), being eigenvectors of the hermitian operators L 2 and L z , are orthog¬ 
onal. We also demand that they be of unit norm; that is, 

{Y jm \Y rm/ )=8 jr 8 mmf . (12.23) 

This will determine the constants C±, introduced earlier. Let us consider C + first, 
which is defined by L+ \Yj m ) = C+ |y ； , ni+ i). The hermitian conjugate of this 
equation is (Yj m \L- = C^. We contract these two equations to get 

{Yjm\ L-L+ \Y jm ) = |C + | 2 (r / > + i|F />+ i). Then we use the second relation 
in Equation (12.21), Theorem 12.3.1, and (12.23) to obtain 

+ 1 ) - + 1 ) = |C +| 2 |C+| = s/ j (j + 1 ) — m(m H- 1 ). 

Adopting the convention that the argument (phase) of the complex number C + is 
zero (and therefore that C+ is real), we get 

C+ = y/j(j -\-l) 

Similarly, C_ = + 1) — m(m — 1). Thus, we get 

L + \Yjm) = + + ， 

L— \ Yj m ) = j(j + 1) — — 1) | 巧， ” 卜 1 》. (12.24) 



338 12. SEPARATION OF VARIABLES IN SPHERICAL COORDINATES 


12.3.2. Example. Let us find an expression for | Yi m ) by repeatedly applying L_ to |F//). 
The action for L_ is completely described by Equation (12.24). For the first power of L —， 
we obtain 

L- |F//) - 0(/ +1)-/(/ —1) \Y U ^) = y/2l |F U _!). 


We apply L- once more: 

(L_) 2 |7^) = y/2lL- |F U _i> - + 1) — (/ - 1)(/ - 2) |r u 一 2 > 

= y/2ly/2(2l-\)\Y u . 2 ) = V2(2/)(2 / - 1) ]F U _ 2 >. 

Applying I -一 a third time yields 

(L_) 3 \Yu) = V2(20(2 / - DL- |K/,/_2> = y/2(2l)(2l - 1) 掏 - 1) ⑺,/— 3 〉 

=鄭 /) ⑶- 1)(2 / - 2) |7/,/- 3 ). 

The pattern suggests the following formula for a general power k: 
l!t \Yn) = y/kl(2l)(2l-l)...(2l-k+l) 

or \Yu) = ^/kl(2l)\/(2l -/:)! |F/ t /_^). If we set / - = m and solve for weget 


l^m) 


G + m)! i_ m 






The discussion in this section is the standard treatment of angular momentum in 
quantum mechanics. In the context of quantum mechanics. Theorem 12.3.1 states 
the far-reaching physical result that particles can have integer or half-integer spin. 
Such a conclusion is tied to the rotation group in three dimensions, which, in tum，is 
an example of a Lie group, or a continuous group of transformations. We shall come 
back to 汪 study of groups later. It is worth noting that it was the study of differential 
equations that led the Norwegian mathematician Sophus Lie to the investigation 
of their symmetries and the development of the beautiful branch of mathematics 
and theoretical physics that bears his name. Thus, the existence of a connection 
between group theory (rotation, angular momentum) and the differential equation 
we are trying to solve should not come as a surprise. 


12.4 Eigenvectors of L 2 : Spherical Harmonics 

The treatment in the preceding section took place in an abstract vector space. Let 
us go back to the function space and represent the operators and vectors in terms 
of 0 and (p. 

First, let us consider L z in the form of a differential operator, as given in 
Equation (12.17). The eigenvalue equation for L z becomes 

<p) = (p)* 

o(p 


12.4 EIGENVECTORS 0? L 2 \SPHERICAL HARMONICS 339 


We write Yj m {G,<p) = Pj m (0)Qj m ((p) and substitute in the above equation to 
obtain the ODE for (p, dQj m /d<p = imQj„ u which has a solution of the form 
Qj m (<p) = Cj m e im(f> , where is a constant. Absorbing this constant into Pj m , 
we can write 

Y jm {6,<p) = P jm {e)e im< f ) . 

In classical physics the value of functions must be the same at 史 as at 供 + 2jt. 
This condition restricts the values of m to integers. In quantum mechanics, on the 
other hand, it is the absolute values of functions that are physically measurable 
quantities, and therefore m can also be a half-integer. 


12.4.1. Box. From now on, we shall assume that m is an integer and denote 
the eigenvectors ofL 2 by Yi m (0, (p) r in which l is a nonnegative integer. 


Our task is to find an analytic expression for Yi m (9, (p). We need differential 
expressions for L 士 . These can easily be obtained from the expressions for L A - and 
L y given in Equations (12.16) and (12.17). (The straightforward manipulations are 
left as a problem.) We thus have 

L ± = e ±i<p cotO^-). (12.25) 

\ 30 3(p/ 

Since / is the highest value of m，when L + acts on Yn(O y <p) = Pn (G)e ll(p the result 
must be zero. This leads to the differential equation 

/cot0 [Pu(6)e il n = 0 => ( 去一 / C Dt0)/W)=O. 

The solution to this differential equation is readily found to be 
Pn(9) = Ci(smO) 1 . 


The constant is subscripted because each Pu may lead to a different constant of 
integration. We can now write Yu{6, <p) = Ci(smO) l e ll<p . 

With Yu(0, (p) at our disposal, we can obtain any Yi m (0, (p) by repeated ap¬ 
plication of L_. In principle, the result of Example 12.3,2 gives all the (abstract) 
eigenvectors. In practice, however, it is helpful to have a closed form (in terms 
of derivatives) for just the 0 part of Yi m (0, 沪 ) • So, let us apply L_, as given in 
Equation (12.25) to Yu(0 9 <p)\ 


L.Y U = e-^ (-^+/cot^) [Pn(0)e n ^] 

[Pu(0)e il n 


80 
d 

do 


d 

le 


-\-lcot6) Pu(0). 







340 12. SEPARATION OF VARIABLES IN SPHERICAL COORDINATES 


It can be shown that fora positive integer, 

(le +ncot9 ) m = ^eTe 

Using this result yields 


[sm n ef(0)l 


(12.26) 


L. 


V21 Y U -i = V2le^ l ~ l)(p Pi,i-i(0) 
[d 


(一以“/一咖 


(-l)Q 


sin 7 0 dO 

e 叫一 M d 


[sm l G(Q sh^O)] 


(sin 21 0). 


sin 1 0 dO 

We apply L— to (12.27), and use Equation (12.26) with « = / — 1 to obtain 

1 d r . j i . 1 d 


(12.27) 


L 2 _Yu = (-1) 2 (V’ (’ 一 2) 炉 


sin ^ 1 6 d6 


W_ 


6 


sin^ 8 dO 


(sin 21 0) 


2 … 砌 d 

(_1) Cl ^F^de 


d 


(sin 2/ 0) 


Lsin^ dO 


Making the substitution u = cos 0 yields 


e i(l-2)<p d 2 

L -& = ㈤[( 卜 “）]. 

With a little more effort one can detect a pattern and obtain 


L k _Yu = Q 




d k 


(1 — ^ u k 


[(1 - u 2 ) 1 ] . 


If we let A: = / — m and make use of the result obtained in Example 12.3.2, we 
obtain 




(/ - \-m)\ 


Cl 


e imt P d l ~ m 


(1 - u 2 ) m / 2 du l ~ m 


[(1 - u 2 ) 1 ] . 


To specify Yi m (9, <p) completely, we need to evaluate Q. Since C/ does not depend 
onm, we set m = 0 in the above expression, obtaining 


Yio(u, <p) 


1 Q^[(l-u 2 ) 1 ]. 


VTO du l 

The RHS looks very much like the Legendre polynomials of Chapter 7. In fact, 


Yio(u,(p) = 7^( 一 l)W/! 巧⑻ e A,P/(m). 


( 12 . 28 ) 


12.4 EIGENVECTORS OFL 2 : SPHERICAL HARMONICS 341 


spherical harmonics 


associated Legendre 
functions 


Therefore, the normalization of K/o and the Legendre polynomials Pi determines 
C/. 

We now use Equation (6.9) to obtain the integral form of the orthonormality 
relation for Yi m : 

Y lm) = 〈U (乂 却乂 SltiO d6 \6,<p) \Yl m ) 

pn 

= d<p / (12.29) 

Jo Jo 

which in terms of m = cos 0 becomes 


• 2 jt / *1 

d 中 I (p)Yi m (w, (p)du = SnfS mm f, 


(12.30) 


Problem 12.15 shows that using (12.29) one gets Ai = V (2 / + 1)/(4jt). Therefore, 
Equation (12.28) yields not only the value of Ci, but also the useful relation 


Yio(u, <p) 


f 2/ + l 
4 jt 


Pl(u). 


(12.31) 


Substituting the value of Q thus obtained, we finally get 


械— [ (1 _ ^ ， 


4tc 2 l l\ V 


du 1 ' 


(12:32) 


where u = cos^. These functions, the eigenfunctions of L 2 and L z , are called 
spherical harmonics. They occur frequently in those physical applications for 
which the Laplacian is expressed in terms of spherical coordinates. 

One can immediately read off the 沒 part of the spherical harmonics: 


Pim(u) = (-iy 


21 + 1 1 (l + m)l 


d l 


•m 




However, this is not the version used in the literature. For historical reasons the 
associated Legendre functions (u) are used. These are defined by 


P[ n (u) = (-l) m 


(/+m)! / An 


(— 1 ) 




(/-m)!V 21 + l PlmW 

(l + m)\ (1 - u 2 y m/2 d v 

(/ -m)! ¥T\ du l _ m 


•m 


E(l - u 2 ) 1 ] 


(12.33) 


Thus, 


Yi m (e i <p) = (-ir 


n 「2 ，+ l(/-m )! ， 

L 47 T (l + m)!- 


1/2 


Pr (cos $)e 


inup 


(12.34) 



342 12. SEPARATION OF VARIABLES [N SPHERICAL COORDINATES 


We generated the spherical harmonics starting with Yn(9 f (p) and applying the 
lowering operator L_. We could have started with Yi t -i(d, (p) instead, and applied 
the raising operator L + . The latter procedure is identical to the former; nevertheless, 
we outline it below because of some important relations that emerge along the way. 
We first note that 

n 卜/ (12 . 35) 


(This can be obtained following the steps of Example 12.3.2.) Next, we use 
L_ \Yi-i) = 0 in differential form to obtain 


， d 

le 


l cot 0 ) = 0, 


which has the same formas the differential equation for P//. Thus, the solution is 
J P/_ / (6») = C ； (sm0) / ,and 

/>/ ， _/ ⑹厂咖 = Ci(sm0) l e- il<p . 


Applying L + repeatedly yields 

d k 




[(1 - u 2 ) 1 ] , 


(1 - M 2 )(/-fe )/2 du k 
where u = cos 沒 . Substituting k = l —m and using Equation (12.35) gives 


Yi~ m {u,(p) 


I (/ 十 ㈨ ！ ， (-W_ 2 l 

(l - 1 (1^ M 2)m/2 du l-m ； J 


The constant C[ can be determined as before. In fact, for m = 0 we get exactly 
the same result as before, so we expect C[ to be identical to C/. Thus, 


^if,— <p )= 


, e-^ Ul + m)\ 
} V 4tt 2 l l\ y (/ - m)! 


A—m 

■ (1 -" 2) '" /2 ^ [(1 "" 2)/] 

Comparison with Equation (12.32) yields 

(12.36) 

and using the definition 7/ (p) = Pi f ~ m (0)e~ im<p and the first part of Equation 
(12.33), we obtain 




(12.37) 



12.4 EIGENVECTORS OF L 2 : SPHERICAL HARMONICS 343 


associated Legendre 
differential equation 


The first few spherical harmonics with positive m are given below. Those with 
negative m can be obtained using Equation (12.36). 


For/ 

= 0, 

[00 = 

1 

y/4lt 



For/ 

=1, 

^10 = 

， 

5^11 = 


For/ 

= 2 ， 

^20 = 

人 6 产 ^ 

-1 )， 

Y 2 i 


e l(p sin$. 


For / == 3, 


Yu 

^30 
1^31 
巧 2 


15 

32tt 


15 


e 2i(p sin 2 0. 


16tt 


(5 cos 3 0 — 3cos0), 


21 


64tt 


e t(p sin 0(5 cos 2 ^ — 1), 


105 

32tv 


e 2l(p sin 2 0 cos 0 , Y 33 


35 

(Ait 


Wsin 3 仏 


From Equations (12.13) ， (12.18), and (12.34) and the fact that a = /(/ + 1) 
for some nonnegative integer/, we obtain 

smG^-) [P ； n e im(p ] + -4— 卿 ] +/(/ + l)P } m e im(p = 0, 
a6 / 3(0^ 


sin 0 dO 
which gives 

1 d 


sin 6 d 6 




m 1 


sin 2 9 


+l(l + l)P( n =0. 


As before, we let u = cos^ to obtain 


d 

du 


( 1 -，^ 


1(1 + 1 )- 


m A 


\ —w 


Pf = 0 . 


(12.38) 


This is called the associated Legendre differential equation. Its solutions, the 
associated Legendre functions, are given in closed form in Equation (12.33). For 
m = 0, Equation (12.38) reduces to the Legendre differential equation whose 
solutions, again given by Equation (12.33) with m = 0, are the Legendre polyno¬ 
mials encountered in Chapter 7. When m = 0, the spherical harmonics become 
^-independent. This corresponds to a physical situation in which there is an explicit 
azimuthal symmetry. In such cases (when it is obvious that the physical property in 
question does not depend on (p) a Legendre polynomial, depending only on cos 沒， 
will multiply the radial function. 



344 12. SEPARATION OF VARIABLES IN SPHERICAL COORDINATES 


12A1 Expansion of Angular Functions 


The orthonormality of spherical harmonics can be utilized to expand functions 
of 0 and <p in terms of them. The fact that these functions are complete will be 
discussed in a general way in the context of Sturm-Liouville theory. Assuming 
completeness for now, we write 


(P) 


E/=o EL=-/ a im Y im (0 ,(p) if / is not fixed, 

T!m=-l a lmYim if l is fixed, 


(12.39) 


where we have included the case where it is known a priori that f( 0 , (p) has a given 
fixed / value. To find ai m , we multiply both sides by 7^(0, (p) and integrate over 
the solid angle. The result, obtained by using the orthonomiality relation, is 


a im = JJ dQf (6, (p)Y* m 


(12.40) 


where = smOdO dtp is the element of solid angle. A useful special case of 
this formula is 


a\P = Jf d^f(0, ip)Y^(0,cp) 


2/+1 

4tt 


// 


<p) Pi (cos 0), 


(12.41) 


where we have introduced an extra superscript to emphasize the relation of the 
expansion coefficients with the function being expanded. Another useful relation 
is obtained when we let 0 = 0 in Equation (12.39): 


f(6,(p)\e = o 


IXo T!m =-1 a imYim (0, (p)\e=o if Zis not fixed, 
YL=-i ai m Yi m ( 6 9 <p)\ 0 = o if I is fixed. 


From Equations (12.33) and (12.34) one can show that 


Ylmd ^)1^=0 = ^mO^/o(0, (p) — 


Therefore, 


2/ + 
An 


f(0,<p)\ d=iO 


i ⑴ snot fixed, 


a 


(/) / 2/+1 


/0 


Ait 


if / is fixed. 


(12.42) 




12.4 EIGENVECTORS OF L 2 : SPHERICAL HARMONICS 345 


addition theorem for 
spherical harmonics 



Figure 12.1 The unit vectors e r and § r / with their spherical angles and the angle y 
between them. 

12.4.2 Addition Theorem for Spherical Harmonics 

An important consequence of the expansion in terms of Yi m is called the addition 
theorem for spherical harmonics. Consider two unit vectors e r and 6 〆 making 
spherical angles (0, (p) and ( 6 \ <p f ), respectively, as shown in Figure 12.1. Let y 
be the angle between the two vectors. The addition theorem states that 

Ajr ^ 

Pl(cosy) = —-y J2 YL(eW)Y lm (e ， cp). (12.43) 

We shall not give a proof of this theorem here and refer the reader to an 
elegant proof on page 866 which uses the representation theory of groups. The 
addition theorem is particularly useful in the expansion of the frequently occurring 
expression l/|r 一 r'|. For definiteness we assume |r’| = r r < |r| = r. Then, 
introducing t = r f /r, we have 

——-—— = - ^ --- rjx = -(1 -\-t 2 — 2t cos y) _1 ^ 2 . 

|r — r f \ (厂 2 + r 72 — 2 rr f cos y ) 1 / 2 r 

Recalling the generating function for Legendre polynomials from Chapter 7 and 
using the addition theorem, we get 

1 1 00 OO ,1 A l 



346 12. SEPARATION OF VARIABLES JN SPHERICAL COORDINATES 


expansion of 
1/|r- r'l in spherical 
coordinates 


00 / 1 „// 

1=0 m=—l ^ 

It is clear that if r < r\ we should expand in terms of the ratio r/r\ It is therefore 
customary to use r< to denote the smaller and r> to denote the larger of the two 
radii r and r\ Then the above equation is written as 

i oo i \ r l 

This equation is used frequently in the study of Coulomb-like potentials. 

12.5 Problems 

12.1. By applying the operator [xj, p^] to an arbitrary function /(r), show that 
l. x j 5 Pfc] ― i 各 jk. 

12.2. Use the defining relation L/ = 6 /jjtxyp^ to show that^p^ — XkPj = . 

In both of these expressions a sum over the repeated indices is understood. 

12.3. For the angular momentum operator L* = eijkXjp^ show that the commu¬ 
tation relation [Lj f L^] = i€jkiU holds. 

12.4. Evaluate df/dy and 3//3z in spherical coordinates and find L y and L z in 
terms of spherical coordinates. 

12.5. Obtain an expression for L 2 in terms of 0 and (p, and substitute the result in 
Equation (12.12) to obtain the Laplacian in spherical coordinates. 

12.6. Show that L 2 = L+L_ + - L z and L 2 == L_L + + + L z . 

12.7. Show that L 2 , L x , L y , and L z are hermitian operators in the space of square- 
integrable functions. 

12.8. Verify the following commutation relations: 

[L 2 , L 土 ] = 0 ， [L z , L±] = 士 L 土， [L+ ， L_] = 2L Z . 

12.9. Show that L_ | Y a ^) has 0 _ 1 as its eigenvalue for L z ，and that | Y a ^ ± ) cannot 
be zero. 

12.10* Show that if the \Yj m ) are normalized to unity, then with proper choice of 
phase, L_ \Y Jm ) = - 1) 


12.11* Derive Equation (12.35). 


12.5 PROBLEMS 347 


12.12. Starting with L x and L” derive the following expression for L±: 

L± = eM<P { ± le +icoi6 hy 

12.13. Integrate dPfdO —l cotOP =0to find P(0). 

12.14. Verify the following differential identity: 

+ncote ) m = _l_^ [sin ^ /(0)] . 

12.15. Let l = V and m = m 7 = 0 in Equation (12.30), and substitute for Yio from 
Equation (12.28) to obtain Ai = y/(2l + l)/4ir. 

12.16. Show that 


12.17. Derive the relations <p) = (—l) m Y* m (9, cp) and 


V 1 ⑻ = (-ir|^jp/，). 

12.18. Show that \ <P)\ 2 = (2 ； + 1)/(4tt). Verify this explicitly for 

1 = 1 and 1 = 2, 


12.19. Show that the addition theorem for spherical harmonics can be written as 

Pi (cos y) = Pi (cos 0)Pi (cos 0 f ) 

1 Q-m)\ 


+ 2：E 


m- 




Pp (cos 0 ) (cos 6 f ) cos [m ( 沪一 史 ’)]• 


Additional Reading 

1. Morse, P. and Feshbach, M. Methods of Theoretical Physics, McGraw-Hill, 
1953. A two-volume classic including a long discussion of the separation of 
variables in many (sometimes exotic) coordinate systems. 

2. The angular momentum eigenvalues and eigenfunctions are discussed in 
most books on quantum mechanics. See, e.g. ， Messiah, A. Quantum Me¬ 
chanics, volume II, Wiley, 1966. 


13 _ 

Second-Order Linear Differential 
Equations 


The discussion of Chapter 12 has clearly singled out ODEs, especially those of 
second order, as objects requiring special attention because most common PDEs 
of mathematical physics can be separated into ODEs (of second order). This is 
really an oversimplification of the situation. Many PDEs of physics, both at the 
fundamental theoretical level (as in the general theory of relativity) and from 
a practical standpoint (weather forecast) are nonlinear, and the method of the 
separation of variables does not work. Since no general analytic solutions for 
such nonlinear systems have been found, we shall confine ourselves to the linear 
systems, especially those that admit a separated solution. 

With the exception of the infinite power series, no systematic method of solving 
DEs existed during the first half of the nineteenth century. The majority of solutions 
were completely ad hoc and obtained by trial and error, causing frustration and 
anxiety among mathematicians. It was to overcome this frustration that Sophus 
Lie, motivated by the newly developed concept of group, took up the systematic 
study of DEs in the second half of the nineteenth century. This study not only 
gave a handle on the disarrayed area of DEs, but also gave birth to one of the most 
beautiful and fundamental branches of mathematical physics, Lie group theory. 
We shall come back to a thorough treatment of this theory in Parts VH and VIH. 

Our main task in this chapter is to study the second-ordeT linear differential 
equations (SOLDEs). However, to understand SOLDEs, we need some basic un¬ 
derstanding of differential equations in general. The next section outlines some 
essential properties of general DEs. Section 2 is a very brief introduction to first- 
order DEs, and the remainder of the chapter deals with SOLDEs. 



homogeneous and 
inhomogeneous 
ODEs 


13.1 GENERAL PROPERTIES OF ODES 349 


13.1 General Properties of ODEs 

The most general ODE can be expressed as 




，， _ . 9 




(13.1) 


in which F : R n+2 — E is a real-valued function of n + 2 real variables. When 
F depends explicitly and nontrivially on d n yfdx n . Equation (13.1) is called an 
nth-order ODE. An ODE is said to be linear if the part of the function F that 
includes y and all its derivatives is linear in y. The most general wth，order linear 
ODE is 


PoMy + + ， • ■ + = q(x) for p n (x) ^ 0, 


(13.2) 


where {/7；}^_ 0 and 孕 are functions of the independent variable x. Equation (13.2) 
is said to be homogeneous if ^ = 0; otherwise, it is said to be inhomogeneotus 
and q(x) is called the inhomogeneous term. It is customary, and convenient, to 
define a linear differential operator L by 1 


d 


L = ^oW + PiW-^- + 

ax 

and write Equation (13.2) as 
L|j] =q(x). 




+ Pn(x) 


d n 

dx n ’ 


p n {x) ^ 0, 


(13.3) 


(13.4) 


A solution of Equation (13.2) or (13.4) is a single-variable function / : R — R 

such that F(x, f(x), f{x) ./ ⑻ (x)) = 0, or L[/] = q(x), for all a: in the 

domain of definition of /. The solution of a differential equation may not exist if 
we put too many restrictions on it. For instance, if we demand that / : M M 
be differentiable too many times, we may not be able to find a solution, as the 
following example shows. 

13.1.1. Example* The most general solution of dy/dx = |x| that vanishes at x = 0 is 


fix) 


\x 2 i£x>0, 

-\x 2 ifx <0. 


This function is continuous and has first derivative f{x) = \x\ y which is also continuous 
at x = 0. However, if we demand that its second derivative also be continuous at x = 0, 
we cannot find a solution, because 




if a: >0 , 
if < 0- 


1 Do not confuse this linear differential operator with the angular momentum (vector) operator L. 



350 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS 


If we want f /ff (x) to exist at x =0, then we have to expand the notion of a function to 
include distributions, or generalized functions. M 

. Overrestricting a solution for a differential equation results in its absence, but 
underrestricting it allows multiple solutions. To strike a balance between these two 
extremes, we agree to make a solution as many times differentiable as plausible and 
to satisfy certain initial conditions. For an wth-order DE such initial conditions 
are commonly equivalent (but not restricted) to a specification of the function and 
of its first n — 1 derivatives. This sort of specification is made feasible by the 
following theorem. 

implicit function 13.1.2. Theorem, (implicit function theorem) Let G : M rt+1 —艮 given by 

theorem G(x\,X 2 , e 3R y have continuous partial derivatives up to the kth 

order in some neighborhood of a point Pq = ( 广 1 ，厂 2, • ■ ■ ， 心 +i ) 以 M n+1 . Let 
(9G/3x n+ i)|p 0 7 ^ 0. Then there exists a unique function F \ W 1 ^ R that is 
continuously differentiable k times at (some smaller) neighborhood of Po such 
that jc /i+ i = F(x\ i X 2 . x n ) for all points 尸 = (xi ， X 2 , • ， . ， x n ^.\) in a neigh¬ 

borhood of Pq and 

» 

G(x\,X2, . -^Xn, F(x\,X2, . . •, X n )) =0. 

Theorem 13.1.2 simply asserts that under certain (mild) conditions we can 
“solve” for one of the independent variables in G(x\,X 2 , 知 + 1 ) = 0 in terms 
of the others. A proof of this theorem is usually given in advanced calculus books. 

Application of this theorem to Equation (13.1) leads to 

d n y ( dy d 2 y d n ~ l y\ 

& 、 X ， y ，石，巧 ’…， dx^) y 

provided that G satisfies the conditions of the theorem. If we know the solution 
y = f{x) and its derivatives up to order n — 1 , we can evaluate its nth. derivative 
using this equation. In addition, we can calculate the derivatives of all orders 
(assuming they exist) by differentiating this equation. This allows us to expand 
the solution in a Taylor series. Thus — for solutions that have derivatives of all 
orders ― knowledge of the value of a solution and its first n — \ derivatives at a 
point xq determines that solution at a neighboring points:. 

We shall not study the general ODE of Equation (13.1) or even its simpler 
linear version (13.2). We will only briefly study ODEs of the first order in the next 
section, and then concentrate on linear ODEs of the second order for the rest of 
this chapter. 

13.2 Existence and Uniqueness for First-Order DEs 

A general first-order DE (FODE) is of the form G[x, y, y ! ) = 0. We can find / 
(the derivative of y) in terms of a function of x and 3 ? if the function G[x\, X 2 , 文 3 ) 



13.2 EXISTENCE AND UNIQUENESS FOR FIRST-ORDER DES 351 


the most general 
FODE in normal form 


explicit solution to a 
general first-order 
linear differential 
equation 


Peano existence 
theorem 


Lipschitz condition 


is differentiable with respect to its third argument and dG/dx^ ^ 0. In that case 
we have 


,_dy 
y "Tx 


= y), 


(13.5) 


which is said to be a normal FODE. If F(x, y) is a linear function of then 
Equation (13.5) becomes a first-order linear DE (FOLDE), which can generally 
be written as 

Pi ⑷: ^ + Po(x)y = q(x). (13.6) 

It can be shown that the general FOLDE has an explicit solution: (see [Hass 99]) 

13.2.1. Theorem. Any first order linear DE of the form pi(x)y f -\-po(x)y = q(x), 
in which po, and q are continuous functions in some interval (a, b), has a 
general solution 


y = fM 




fx(x)pi(x) 
where C is an arbitrary constant and 

M Po(t) 


C + I fji(t)q(t) dt 

Jxi _ 


弘 (x)= 


Pl(x) 


exp 


-^XQ 


P\{t) 


dt 


where xq and x\ are arbitrary points in the interval {a, b). 


(13.7) 


(13.8) 


No such explicit solution exists for nonlinear first-order DEs. Nevertheless, it 
is reassuring to know that a solution of such a DE always exists and under some 
mild conditions, this solution is unique. We summarize some of the ideas involved 
in the proof of the existence and uniqueness of the solutions to FODEs. (For proofs, 
see the excellent book by Birkhoff and Rota [Birk 78].) We first state an existence 
theorem due to Peano: 


13.2.2. Theorem, (Peano existence theorem) If the function F (x, y) is continuous 
for the points on and within the rectangle defined by \y — c\ < K and \x—a\ < N t 
and if y)\ < M there, then the differential equation y f = F(x, y) has at 
least one solution，y = /(x), defined for \x —a\ < min(A^, K/M) and satisfying 
the initial condition f(a) = c. 

This theorem guarantees only the existence of solutions. To ensure uniqueness, 
the function F needs to have some additional properties. An important property is 
stated in the following definition. 

13*2.3. Definition. Ajunction F(x^ y) satisfies a Lipschitz condition in a domain 
D CR 2 iffor some finite constant L (Lipschitz constant), it satisfies the inequality 


\F(x, y\) - F(x,y 2 )\ < L\yi ~ y 2 \ 
for all points (x, ji) and (x, 3 ^ 2 ) in D. 



352 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS 


uniqueness theorem 


local existence and 
uniqueness theorem 


13.2.4* Theorem, (uniqueness theorem) Let f(x) and g(x) be any two solutions 
of the FODE y f = F(x, y) in a domain D, where F satisfies a Lipschitz condition 
with Lipschitz constant L. Then 

|/(;c)-g(x)| 5 e L| 卜 … 1 / ⑷ -g ⑷ 

In particular% the FODE has at most one solution curve passing through the point 
(a t c) € D. 

The final conclusion of this theorem is an easy consequence of the assumed 
differentiability of F and the requirement f(a) = g(a) = c. The theorem says 
that if there is a solution y = f(x) to the DE y f — F(x, 3 ^) satisfying / (a) = c, 
then it is the solution. 

The requirements of the Peano existence theorem are too broad to yield so¬ 
lutions that have some nice properties. For instance, the interval of definition of 
the solutions may depend on their initial values. The following example illustrates 
this point. 

13.2.5. Example* Consider the DE dy jdx = e y • The general solution of this DE can be 
obtained by direct integration: 

e~ y dy = dx —e~ y = x C, 

If y = b when = 0, then C = —e~ b , and 

= -x^e~ b =>• y = — \n(e~ b -x). 

Thus, the solution is defined for —00 < x < e~^, i.e,，the interval of definition of a solution 
changes with its initial value. ^ 

To avoid situatiom illustrated in the example above, one demands not just the 
continuity of F — as does the Peano existence theorem — but a Lipschitz condition 
for it. Then one ensures not only the existence, but also the uniqueness: 

13.2.6. Theorem, (local existence and uniqueness theorem) Suppose that thejunc- 
tion F(x, y) is defined and continuous in the rectangle \y — c\ < K, \x — a\ < N 
and satisfies a Lipschitz condition there. Let M = max \F(x, y)| in this rectan¬ 
gle. Then the differential equation y f = F(x, y) has a unique solution y = f(x) 
satisfying f{a) = c and defined on the interval \x — a\ < min(iV ， K/M). 

13.3 General Properties of SOLDEs 

The most general SOLDE is 

+ + Po ^ y = ( 13,9 ) 


13.3 GENERAL PROPERTIES OF SOLDES 353 


normal form of a 
SOLDE 


singular points of a 
SOLDE 


regular SOLDE 


superposition 

principle 


Dividing by p 2 (x) and writing p for p\/p 2 ,q for po/P2^ and r for P 3 /P 2 reduces 
this to the normal form 


<P"y dy 

— 2+pM -^ q{x) y = ri X y 


(13.10) 


Equation (13.10) is equivalent to (13.9) if p 2 (x) ^ 0. The points at which p 2 (x) 
vanishes are called the singular points of the differential equation. 

There is a crucial difference between the singular points of linear differential 
equations and those of nonlinear differential equations. Fora nonlinear differential 
equation such as (x 2 —y)y f = x 2 -\-y 2 , the curve y = x 2 is the collection of singular 
points. This makes it impossible to construct solutions y = f(x) that are defined 
on an interval I = [a, b] of the x-axis because for any x I, there is a y for 
which the differential equation is undefined. Linear differential equations do not 
have this problem, because the coefficients of the derivatives are functions of x 
only. Therefore, all the singular “curves” are vertical. Thus，we have the following 
definition. - 


13.3.1. Definition. The normal form of a SOLDE, Equation (13.10), isregular on 
an interval [a, b] of the x~axis if p(x), q{x) f and r(jc) are continuous on [a, b], 
A solution of a normal SOLDE is a twicQ-di&QTcntmblo function y = f(x) that 
satisfies the SOLDE at every point of [a, b}. 


It is clear that any function that satisfies Equation (13.10) ― or Equation 
(13.9)_must necessarily be twice differentiable, and that is all that is demanded 
of the solutions. Any higher-order differentiability requirement maybe too restric¬ 
tive, as was pointed out in Example 13.1.1. Most solutions to a normal SOLDE, 
however, automatically have derivatives of order higher than two. 

We write Equation (13.9) in the operator form as 


LM = P 3 , 


d 2 


d 


where L 三 P 2 ^ + Pi — + po- 

UmAf UJk/ 


(13.11) 


It is clear that L is a linear operator because d/dx is linear, as are all powers of 
it. Thus, for constants a and p, L[cey\ + = oiL[y\] + In particular, 

if y\ and 乃 are two solutions of Equation (13.11)，then L[y\ — = 0. That 

is, the difference between any two solutions of a SOLDE is a solution of the 
homogeneous equation obtained by setting 773 = 0, 2 

An immediate consequence of the linearity of L is the following: 

13.3.2. Lemma. lfL[u] = r(x), L[v] = a and p are constants，and w = 
au + fiy, then L[w;] = ar(x) + 


The proof of this lemma is trivial, but the result describes the fundamental prop¬ 
erty of linear operators: When r = ^ = 0, that is, in dealing with homogeneous 


2 This conclusion is, of course^ not limited to the SOLDE; it holds for all linear DEs. 


354 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS 


equations, the lemma says that any linear combination of solutions of the homo¬ 
geneous SOLDE (HSOLDE) is also a solution. This is called the superposition 
principle. 

Based on physical intuition, we expect to be able to predict the behavior of 
a physical system if we know the differential equation obeyed by that system, 
and, equally importantly, the initial data. A prediction is not a prediction unless it 
is unique . 3 This expectation for linear equations is borne out in the language of 
mathematics in the form of an existence theorem and a uniqueness theorem. We 
consider the latter next. But first, we need a lemma. 

13.3.3. Lemma. The only solution 裒 (x) of the homogeneous equation y ff + py f + 
qy = 0 defined on the interval [a, b] that satisfies g(a) = 0 = g’ (a) is the trivial 
solution g = 0 . 

Proof. Introduce the nonnegative function m(^) = [g(x )] 2 + [g f (x)] 2 and differ¬ 
entiate it to get 

u\x) = 2g f g + 2g f g f/ = 2g f (g + g ff ) - 2g\g - pg - qg) 

= — 2 /?(〆 ) 2 + 2(1 - q)gg\ 

Since (g 士 g r ) 2 > 0, it follows that2|^g / | < g 2 g ,2 . Thus, 

2 ( 1 - q)gg / <2\(l^ q)8g f \=2Kl-q)\ [gg^ 

< 1 ( 1 - q)\{g 2 ^g a ) <(1 + \q\){g 2 + 8% 

and therefore, 

u f (x) < \u f (x)\ = I - 2pg /2 + 2(1 - q)gg f \ 

<2\p\g a + (\ + \q\){g 2 + g a ) 

=[1 + \q(x)\]g 2 + [1 + \q(x)\ + 2\p(x)\]g a . 

Now let 尺 =1 + max[(x) \ + 2\p(x)\], where the maximum is taken over [a, b]. 
Then we obtain 

u\x) < K(g 2 g ,2 ) = Ku(x) Wxela.bl 

Using the result of Problem 13.1 yields u(x) < u(a)e K ^ x ~ a ^ for all jc 6 [a, b]. 
This equation, plus u(a) — 0, as well as the fact that u(x) > 0 imply that u{x )— 
g 2 (x) 4- g a (x) = 0. It follows that g(x) = 0 = g\x) for all x e [a, b], □ 


uniqueness of 13.3.4. Theorem. (Uniqueness theorem) Tfp andq are continuous on [a, b] y then 
solutions to SOLDE 

3 Physical intuition also tells us that if the initial conditions are changed by an infinitesimal amount, then the solutions 
will be changed infinitesimally. Thus, the solutions of linear differential equations are said to be continuous functions of the 
initial conditions. Nonlinear differential equations can have completely different solutions for two initial conditions that are 
infinitesimally close. Since initial conditions cannot be specified with mathematical precision in practice, nonlinear differential 
equations lead to unpredictable solutions, or chaos. This subject has received much attention in recent years. For an elementary 
discussion of chaos see [Hass 99, Chapter 15]. 


13.4 THE WRONSKIAN 355 


at most one solution y = f{x) of Equation (13.10) can satisfy the initial conditions 
f (a) = c\ and f f {a) = C 2 , where c\ and C 2 are arbitrary constants. 

Proof. Let/i and be two solutions satisfying the given initial conditions. Then 
their difference, g = /i — / 2 , satisfies the homogeneous equation [with r(x) = 0]. 
The initial condition that g(x) satisfies is clearly g(a) = 0 = g r {a). By Lemma 
13.3.3, g — 0 or /i = / 2 . □ 

Theorem 13.3.4 can be applied to any homogeneous SOLDE to find the latter’s 
most general solution. In particular, let f\(x) and f 2 (x) be any two solutions of 

〆’ + P ⑺ ；/ + q(x)y = 0 (13.12) 

defined on the interval [a, b]. Assume that the two vectors Vi = f[(a)) 

and V 2 = (/ 2 (a)，in E 2 are linearly independent. 4 Let g(x) be another 
solution. The vector (g(a) ， g f (a)) can be written as a linear combination of vj and 
V 2 , giving the two equations 

gW = ci/i(a) +c 2 / 2 (a), 

8 f (p) = Q//(a) +C2/i(a). 

Now consider the function u(x) = g(x) - cif x (x) - c 2 f 2 (x), which satisfies 
Equation (13.12) and the initial conditions u(a) = u\a) = 0. By Lemma 13.3.3, 
we must have u(x) = 0 or g{x) = c\f\{x) + C 2 / 2 CO. We have proved the 
following: 

13.3.5* Theorem. Let f\ and fi be two solutions of the HSOLDE 

y n + py f + 打 = 0 ， 

where p and q are continuous functions defined on the interval [a, b]. If 

and (A ⑷， /») 

are linearly independent vectors in M 2 , then every solution g(x) of this HSOLDE 
is equal to some linear combination g(jc) : = cifi(x)+c 2 f 2 (x) of fi and f 2 with 
constant coefficients c\ and C 2 - 


13.4 The Wronskian 

The two solutions f\{x) and / 2 U) in Theorem 13.3.5 have the property that any 
other solution 裒 (a ：) can be expressed as a linear combination of them. We call 
basis of solutions f\ and a basis of solutions of the HSOLDE. To fonna basis of solutions, f\ 


4 If they are not, then one must choose a different initial point for the interval. 


356 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS 


Wronskian defined 


and /2 must be linearly independent. The linear dependence or independence of 
a number of functions {fiYl—i '■> \p>-, fo] —> R is a concept that must hold for all 
x e [a 9 b\. Thus, if {a*}" = i € Mean be found such that 

(X\Mxq) + Of2/2(^o) + • • • + O^nfnixo) = 0 

for some jco € [a, b], it does not mean that the /’s are linearly dependent. Linear 
dependence requires that the equality hold for all jc G [a, b]. In fact, we must write 

Ctlfl + Ot2fl + - \-OCnfn = 0 , 

where 0 is the zero function. 

13.4.1. Definition. The Wronskian of any two differentiable functions f\(x) and 
fl(x) is 

/flM 

W(/i,/ 2 ;x) = f 2 (x)f((x) = det • 

V/ 2 (^) f ㈣) 

13.4.2. Proposition. The Wronskian of any two solutions of Equation (13.12) sat¬ 
isfies 

W(f u / 2 ；^) = W(fi, f 2 ； c)e - 
where c is any number in the interval [a, b]. 

Proof. Differentiating both sides of the definition of Wronskian and substituting 
from Equation (13.12) yields a FOLDE for W(f\ 9 fr, x), which can be easily 
solved. The details are left as a problem. □ 

An important consequence of Proposition 13.4.2 is that the Wronskian of any 
two solutions of Equation (13.12) does not change sign in [a, b]. In particular, if 
the Wronskian vanishes at one point in [a, b], it vanishes at all points in [a, b]. 
The real importance of the Wronskian is contained in the following theorem. 

13.4.3. Theorem. Two differentiable functions f\ and / 2 , which are nonzero in 
the interval [a, b\ are linearly dependent if and only if their Wronskian vanishes. 

Proof. If/i and fi are linearly dependent, then one is a multiple of the other, and 
the Wronskian is readily seen to vanish. Conversely, assume that the Wronskian 
is zero. Then 

Mx)f^x)-f 2 (x)f{(x) = 0 ^ hdfi^fidfx ^ h = Cfi 


and the two functions are linearly dependent. 


□ 


13.4 THE WRONSKIAN 357 


Josef Hoene de Wronski (1778-1853) was bom Josef Hoene, but he adopted the name 
Wronski around 1810 just after he married. He had moved to France and become a French 
citizen in 1800 and moved to Paris in 1810, the same year he published his first memoir on 
the foundations of mathematics, which received less than favorable reviews from Lacroix 
and Lagrange. His other interests included the design of caterpillar vehicles to compete with 
the railways. However, they were never manufactured. 

Wronski was interested mainly in applying philosophy to 
mathematics, the philosophy taking precedence over rigorous 
mathematical proofs. He criticised Lagrange’s use of infinite 
series and introduced his own ideas for series expansions of a 
function. The coefficients in this series are determinants now 
known as Wronskians [so named by Thomas Muir (1844— 

1934), a Glasgow High School science master who became 
an authority on determinants by devoting most of his life to 
writing a five-volume treatise on the history of determinants]. 

For many years Wronski’s work was dismissed as rubbish. 

However, a closer examination of the work in more recent 
times shows that although some is wrong and he has an incredibly high opinion of himself 
and his ideas, there are also some mathematical insights of great depth and brilliance hidden 
within the papers. 



13.4A Example. Let f\ (x) = jc and f2(x) = \x\ for a: e [—1 ， 1]. These two functions 
are linearly independent in the given interval, because a\x+a2\x\ = Ofor all x if and only 
if o?l == a] = 0. The Wronskian, on the other hand, vanishes for all ^ e [—1, +1]: 


W(f u f 2 ； x)=x 


d\x\ 

dx 


\A d i 


X 


d\x\ 

dx 




x 


d \x if a: > 0 ix if > 0 
dx —x if x < 0 —x if a: < 0 


x — = 0 

—x — (—jt) = 0 


ifx >0 
if x < 0. 


Thus, it is possible for two functions to have a vanishing Wronskian without being linearly 
dependent. However, as we showed in the proof of the theorem above, if the functions 
are differentiable in their interval of definition, then they are linearly dependent if their 
Wronskian vanishes. M 

13.4.5. Example. The Wronskian can be generalized to n functions. The Wronskian of 
the functions fu h . fn is 


W(/i ， h . fn,x) =det 

(fiM 

... 

/2« 

nr 


KfnM 

flM) … 










358 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS 


If the functions are linearly dependent, then W(/i, / 2 ,..., fn\x) = 0. 

For instance, it is clear that e' e~ x , and sinhx are linearly dependent. Thus, we expect 






W(e x ,e~ x , sinhx; x) = det ( e 


-x 


-x 


\sinh^: coshx smhxj 
to vanish, as is easily seen (the first and last columns are the same). _ 

13.4.1 A Second Solution to the HSOLDE 

If we know one solution to Equation (13.12), say /i, then by differentiating both 
sides of 

Mx)f^( X ) - = W(x) = W(c)e~fcP^, 

dividing the result by /j 2 , and noting that the LHS will be the derivative of / 2 // 1 , 
we can solve for /i in terms of f\. The result is 


/ 2 OO = fiM ^ 


C + K j 一。 exp - J 夕 (0 沿]心 j ， (13.13) 

where K = VK(c) is another arbitrary (nonzero) constant; we do not have to know 
W(x) (this would require knowledge of / 2 , which we are trying to calculate!) 
to obtain W{c). In fact, the reader is urged to check directly that / 2 OO satisfies 
the DE of (13.12) for arbitrary C and K. Whenever possible — and convenient — 
it is customary to set C = 0, because its presence simply gives a term that is 
proportional to the known solution f\(x). 

13.4.6. Example* ⑻ A solution to the SOLDE y n — k 2 y = 0 is e kx . To find a second 
solution, we let C = 0 and 尺 =1 in Equation (13.13). Since p(x) = 0, we have • 

-2ka 


齡户 ( 0 + f 矣卜士七 + £ ^， 


which, ignoring the second term (which is proportional to the first solution), leads directly 
to the choice of e~ kx as a second solution. 

(b) The differential equation y ,,J tk 2 y = 0 has sin kx as a solution. With C = 0,a = n/{2k\ 
and = 1, we get 


胁油 (0 丄 ㉛ 


)- 


smkx cot ks\^^2k 


— coskx. 


(c) For the solutions in part (a), 


,kx 


W(x) = d Q t^_ kx 
and for those in part (b )， 


ke kx 

—ke~^ x 


)= 一2灸， 


W(x) 


/sinkx k cos kx 
Vcos kx 一 k sin h 


) = — 


k. 


13.4 THE WRONSKIAN 359 


Both Wronsklans are constant. In general, the Wronskian of any two linearly independent 
solutions of y ,f + q{x)y = 0 is constant. 圔 

Most special functions used in mathematical physics are solutions of SOLDEs. 
The behavior of these functions at certain special points is determined by the 
physics of the particular problem. In most situations physical expectation leads to 
a preference for one particular solution over the other. For example, although there 
are two linearly independent solutions to the Legendre DE 


d_ 

dx 


ax 


H-w(w + l)y = 0, 


the solution that is most frequently encountered is the Legendre polynomial F„(x) 
discussed in Chapter 7. The other solution can be obtained by solving the Legendre 
equation or by using Equation (13.13), as done in the following example. 

13*4.7, Example. The Legendre equation can be reexpressed as 

d 2 y 2x dy n(n + 1) 


dx^ \ dx 
This is an HSOLDE with 
2x 


x 2 


P(x) 


X 


2 


■y = 0. 


= 


n(n + 1) 
1 — P ■ 


One solution of this HSOLDE is the well-known Legendre polynomial P n (x). Using this 
as our input and employing Equation (13.13), we can generate another set of solutions. 

Let Q n (x) stand for the linearly independent ‘"partner” of P n (x). Then, setting C = 
0 = c in Equation (13.13) yields 


Qn(x) = KP n (x) 


KP n {x) 


L 丽 exp U 


it 




dt 


ds 


*x 


^ Pi(s) 


-i- 


= 尸 nOO 




ds 


(1 一 J 2 ) 伽)’ 


where A n is an arbitrary constant determined by standardization, and or is an arbitrary point 
in the interval [—1, +1]. For instance, for n = 0, we have Pq = 1， and we obtain 


*x 


Q 0 (x) = A q 


ds 




l — s 2 


^0 


'1 

.2 


In 


1 


x 


l — x 


1 

2 


In 


1 


a 


1 


a u 


The standard form of Qq(x) is obtained by setting Aq = 1 and a = 0: 


2oW = 2 ^ 


l-x 


for |:c| < 1. 


Similarly, since P\{x) = x, 
rx ds 


filW -A x x 


Ax + Bx In 




x 


L S 2 (l -S 2 ) 

Here standardization is A = 0, = 姜 ， and C — —1. Thus, 

l + x 


+ C for |j:| < 1. 


= -jx\n. 


x 


回 



360 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS 


13.4.2 The General Solution to an ISOLDE 

Inhomogeneous SOLDEs (ISOLDES) can be most elegantly discussed in terms of 
Green’s functions, the subject of Chapter 20, which automatically incorporate the 
boundary conditions. However, the most general solution of an ISOLDE, with no 
boundary specification, can be discussed at this point. 

Let g(jc) be a particular solution of 

L[y] = y fr + py f ^-qy — r{x) (13.14) 

and let h(x) be any other solution of this equation. Then h(x) — g(x) satisfies 
Equation (13.12) and can be written as a linear combination of a basis of solutions 
fi (x) and /2 ( 文 )， leading to the following equation: 

h(x) = c\fi(x) + c 2 fi{x) + gW. (13,15) 

Thus, if we have a particular solution of the ISOLDE of Equation (13.14) and two 
basis solutions of the HSOLDE, then the most general solution of (13.14) can be 
expressed as the sum of a linear combination of the two basis solutions and the 
particular solution. 

We know how to find a second solution to the HSOLDE once we know one 
solution. We now show that knowing one such solution will also allow us to find a 
particular solution to the ISOLDE. The method we use is called the variation of 
method of variation constants. This method can also be used to find a second solution to the HSOLDE. 
of constants Let /i and /2 be the two (known) solutions of the HSOLDE and g(^) the 
sought-after solution to Equation (13.14). Write g as 尽 (x) = f\(x)v(x) and sub¬ 
stitute it in (13.14) to get a SOLDE for u(x): 



This is first order linear DE in v f , which has a solution of the form 
v f = 

where is the (known) Wronskian of Equation (13.14). Substituting 
W(x) _ - f 2 (x)f{(x) = d_ /f 2 \ 

in the above expression for t/ and setting C = 0 (we are interested in a particular 
solution), we get 

^ = f x M_ dt 

dx dx \f\) ] a W(t) 

_ d_ r/ 2 (x) f x fi(t)r(t) ^ i — f 2 (x) d_ f x ^ 

dx J a W(t) 」 f\(x) dx J a W(t) 


W(x) 

/l 2 W 


C + 




W(t) 


=MxMx)/W(x) 



13.4 THE WRONSKIAN 361 


and 




flMJ a W(t) 

This leads to the particular solution 


聊 


拟 ) = fl ( XMX )=f2M r r 


W(t) 


We have just proved the following result. 


W(t) 


(13.16) 


13A8. Proposition. Given a single solution f\{x) of the homogeneous equation 
corresponding to an ISOLDE, one can use Equation (13.13} to find a second solu¬ 
tion f 2 (x) of the homogeneous equation and Equation (13.16) to find a particular 
solution 茗 (x). The most general solution h will then be 

Hx) = ci/i(x) + c 2 / 2 (j:) + 


13.4.3 Separation and Comparison Theorems 

The Wronskian can be used to derive some properties of the graphs of solutions of 
HSOLDEs. One such property concerns the relative position of the zeros of two 
linearly independent solutions of an HSOLDE. 

the separation 13.4.9. Theorem, (the separation theorem) The zeros of two linearly independent 
theorem solutions of an HSOLDE occur alternately. 

Proof. Let f\(x) and fi(x) be two independent solutions of Equation (13.12). 
We have to show that a zero of f\ exists between any two zeros of fi. The linear 
independence of f\ and /2 implies that W(f\, x) ^0 for any jc € [a, b]. Let 
xi € [a, b] be a zero of / 2 . Then 

0¥= W(fuf 2 ； Xi) = - /2 ⑷ / 私） =/l(W 知 ) _ 

Thus, f\(xt) 一 0 and ^ 0. Suppose that x\ and^ 2 — where X 2 > x\ — are 
two successive zeros of fz. Since /2 is continuous m[a,b] and /^(JCi) _ 0, /2 has 
to be either increasing [/^(^l) > 0] or decreasing < 0] at For 乃 to 

be zero at X 2 , the next point, /jfe) must have the opposite sign from (see 

Figure 13.1). We proved earlier that the sign of the Wronskian does not change 
in [a, b] (see Proposition 13.4.2 and comments after it). The above equation then 
says that and f\(x 2 ) also have opposite signs. The continuity of f\ then 

implies that f\ must cross the x-axis somewhere between x\ and X 2 . A similar 
argument shows that there exists one zero of /2 between any two zeros of /i. □ 


362 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS 



Figure 13.1 If > 0 > /‘( 巧 )， then (assuming that the Wronskian is positive) 

> 0 > 


13«4«10« Example. Two linearly independent solutions of y" + y = 0 are sinx and 
cos;c. The separation theorem suggests that the zeros of sinx and cosx must alternate, a 
fact known from elementary trigonometry: The zeros of cos vC occur at odd multiples of 
jt/ 2, and those of sinx occur at even multiples of 7r/2. M 

A second useful result is known as the comparison theorem (for a proof, see 
[Birk 78, p. 38]). 

the comparison 13.4.11. Theorem, (the comparison theorem) Let f and g be nontrivial solutions 
theorem 0 fu f, + p(x)u = 0 and v ,f H- q(x)v = 0, respectively, where p{x) > q(x) for 
all x e {a, b]. Then f vanishes at least once between any two zeros of g, unless 
p = q and f is a constant multiple of g. 

The form of the differential equations used in the comparison theorem is not 
restrictive because any HSOLDE can be cast in this form，as the following example 
shows. 

13.4.12. Example. We show that y ir + p{x)y f + q(x)y = 0 can be cast in the form 
m" + S(x)u = 0 by an appropriate functional transformation. Define w(x) by y = wu, and 
substitute in the HSOLDE to obtain 

(u f w + w f u) f + p(u’w + w r u) + quw = 0, 


or 

wu r/ + (2«/ + pw)u f + (gw + pw f -f w ,f )u = 0. (13.17) 

If we demand that the coefficient of u f be zero, we obtain the DE 2w f + pw =0, whose 
solution is 


w(x) = C exp J p(t) dt^ . 


13.4 THE WRONSKIAN 363 


Dividing (13.17) by this w and substituting for w yields 

w 〃 + S(x)u = 0, where S(x) = q + p— + ——= 殳一 \p^ 一士 P’- 翮 

A useful special case of the comparison theorem is given as the following 
corollary whose straightforward but instructive proof is left as a problem. 


13.4.13. Corollary. If q{x) < 0 for all x G [a, b] t then no nontrivial solution of 
the differential equation v fr + q{x)v = 0 can have more than one zero. 

13.4.14. Example. It should be clear from the preceding discussion that the oscillations 
of the solutions of v fr + q{x)v = 0 are mostly determined by the sign and magnitude of 
q(x). For q(x) < 0 there is no oscillation; that is, there is no solution that changes sign 
more than once. Now suppose that q(x) > k 2 > 0 for some real k. Then, by Theorem 
13.4.11, any solution of v ff + q(x)v = 0 must have at least one zero between any two 
successive zeros of the solution sin kx of u ,f + k 2 u = 0. This means that any solution of 
u" + q{x)v = 0 has a zero in any interval of length n/k if q{x) >k 2 >0. 

Let us apply this to the Bessel DE, 


y " + - x y,+ { l - n i) y=0 - 

We can eliminate the y f term by substituting v/^/x for : y. 5 This transforms the Bessel DE 
into. 


W " + 



4n 2 -l 

4x 2 


)v = 0. 


oscillation Of the We compare this，for « = 0, with m" + m = 0, which has a solution u = sin^:, and conclude 
Bessel function of that each interval of length it of the positive x -axis contains at least one zero of any solution 

order zero 0 f order zero (n = 0) of the Bessel equation. Thus, in particular, the zeroth Bessel function, 

denoted by /o(;c)，has a zero in each interval of length tt of the -r-axis. 

On the other hand, for 4n 2 — 1 > 0, or n > we have 1 > [1 — (4« 2 — l)/4x 2 ]. 
This implies that sinx has a/ least one zero between any two successive zeros of the Bessel 
functions of order greater than It follows that such a Bessel function can have at most 
one zero between any two successive zeros of sin jc (or in each interval of length n on the 
positive x-axis). M 


13,4.15. Example. Let us apply Corollary 13.4.13 to u 〃 一 i; = 0 in which ^ (a:) = —1 < 
0. According to the corollary, the most general solution, c\e x -H cie_ x t can have at most 
one zero. Indeed, 


c\e x + C 2 ^~ x = 0 x = ^ h 


ci 

c\ 


and this (real) x (if it exists) is the only possible solution, as predicted by the corollary. M 


^Because of the square root in the denominator, the range of x will have to be restricted to positive values. 




364 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS 


13.5 Adjoint Differential Operators 

We discussed adjoint operators in detail in the context of finite-dimensional vector 
spaces in Chapter 2. In particular, the importance of self-adjoint，or hermitian, 
operators was clearly spelled out by the spectral decomposition theorem of Chapter 
4. A consequence of that theorem is the completeness of the eigenvectors of a 
hermitian operator, the fact that an arbitrary vector can be expressed as a linear 
combination of the (orthonormal) eigenvectors of a hermitian operator. 

Self-adjoint differential operators are equally important because their “eigen- 
ftinctions” also form complete orthogonal sets，as we shall see later. This section 
will generalize the concept of the adjoint to the case of a differential operator (of 
second degree). 

13.5.1. Definition. The HSOLDE 

L[j] = pi{x)y f, + pi(x)y f + po(x)y = 0 (13.18) 

exact SOLDE is said to be exact if 

L[/] = /? 2 ⑷ /〃 + PlW/4 - po(x)f = + B(x)f] 

ax (13.19) 

integrating factor for for all f g G 2 [a, b] and for some A,B e G 1 ^, b].An integrating factor for L[y] 
SOLDE i s a function fx(x) such that fx(x)L[y] is exact 

If an integrating factor exists, then Equation (13.18) reduces to 

^-[A(x)y f ^B(x)y] = 0 A(x)y f + B(x)y = C, 
ax 

a FOLDE with a constant inhomogeneous term. Even the ISOLDE corresponding 
to Equation (13.18) can be solved, because 

l^(x)L[y] = fi(x)r(x) ^ + B(x)y] = ii{x)r{x) 

ax 

i400：/+ 召 00；y = /* dt, 

Ja 

which is a general FOLDE. Thus, the existence of an integrating factor completely 
solves a SOLDE. It is therefore important to know whether or not a SOLDE admits 
an integrating factor. First let us give a criterion for the of a SOLDE. 

13*5«2. Proposition* The SOLDE of Equation (13.18) is exact if and only if Pj — 
po ^ 0_ 

Proof. If the SOLDE is exact, then Equation (13.19) holds for all /, implying that 
p 2 = A, pi = A 7 + B, and po = B\ It follows that p f 2 = p[ = A f/ 4* B r , and 
po = B\ which in turn give — p[ po = 0. 





13.5 ADJOINT DIFFEREWTIAL OPERATORS 365 


adjoint of a 
second-order linear 
differential operator 


Conversely if — p[ + po = 0, then, substituting po = —p 1 ^ + p[ in the 
LHS of Equation (13.18), we obtain 

piy ,f + pi / + poy = piy f, + pi / + + p[)y 

=piy n - Piy + = O^y — p f 2 y) f + (piyY 

d 


- p 2 y + p\y)^ 


and the DE is exact. 


□ 


A general SOLDE is clearly not exact. Can we make it exact by multiplying 
it by an integrating factor as we did with a FOLDE? The following proposition 
contains the answer. 

13.5.3. Proposition, A function jx is an integrating factor of the SOLDE of Equa¬ 
tion (13.18) if and only if it is a solution of the HSOLDE 


M[/x] = (P2ix) f, - (pui) f + poll = 0. 

Proof. This is an immediate consequence of Proposition 13.5.2. 

We can expand Equation (13.20) to obtain the equivalent equation 

P2^ + (2p f 2 - p{)fl + (p f 2 - p[ + po) 从 = 0_ 

The operator M given by 

2 ^^ 〆’ 

M — 十 (2/4 — + (P 2 ~ Pi + Po) 


(13.20) 


□ 


(13.21) 


(13,22) 


is called the adjoint of the operator L and denoted by M = L^. The reason for the 
use of the word “adjoint” will be made clear below. 

Proposition 13.5.3 confirms the existence of an integrating factor. However, 
the latter can be obtained only by solving Equation (13.21), which is at least as 
difficult as solving the original differential equation! In contrast, the integrating 
factor fora FOLDE can be obtained by a mere integration [see Equation (13.8)]. 

Although integrating factors for SOLDEs are not as useful as their counterparts 
for FOLDEs, they can facilitate the study of SOLDEs. Let us first note that the 
adjoint of the adjoint of a differential operator is the original operator: = L 

(see Problem 13.11). This suggests that if i; is an integrating factor of L[m ]， then m 
will be an integrating factor of M[u] = [v]. In particular, multiplying the first one 

by v and the second one by u and subtracting the results, we obtain [see Equations 
(13.18) and (13.20)] vL[u\ — ufA[v] = (vp 2 )u f/ — u(p 2 v) /f + (vp\)u f + u(p\v) f 9 
which can be simplified to 


i;L[m] — aM[u] = ^-[p 2 vu f — (p2v) f ^ + p\uv]. 

dx 


(13.23) 




366 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS 


Lagrange identities 


all SOLDEs can be 
made self-adjoint 


Integrating this from atob yields 

rb 


(vL[zi] — uWi[v])dx = [p 2 vu f — (p 2 vYu + p\uv\y a . 


(13.24) 


Equations (13.23) and (13.24) are called the Lagrange identities. Equation (13.24) 
embodies the reason for calling M the adjoint of L: If we consider u and u as abstract 
vectors \u) and |u), L and M as operators in a Hilbert space with the inner product 
{u\ u> = u^(x)v(x)dx, then Equation (13.24) can be written as 

{v\ L \u) — (u\ M |i;) = {u\ L 1 " \v)* — (m| M |v) = [p 2 Vu f — (p 2 v) f u H- piuv]\^. 

If the RHS is zero, then {u\ \v)* = (m| M \v) for all \u) , |u), and since all these 

operators and functions are real, = M. 

As in the case of finite-dimensional vector spaces, a self-adjoint differential 
operator merits special consideration. For M[u] = L^[v] to be equal to L, we must 
have [seeEquations (13.18) and(13.21)] 2/^ -pi = pi and P 2 ~p[ + po = Po- 
The first equation gives p r 2 = p\, which also solves the second equation. If this 
condition holds, then we can write Equation (13.18) as L[y] — P 2 y f, + + Poy ， 

or 


Uy] 


d_ 

dx 


P2(x) 


dy 

dx\ 


+ Po(x)y = 0. 


Can we make all SOLDEs self-adjoint? Let us multiply both sides of Equation 
(13.18) by a function h(x), to be determined later. We get the new DE 

h(x)p 2 (x)y /f + ⑷ poWy = 0 ， 

which we desire to be self-adjoint. This will be accomplished if we choose h{x) 
such that hpi = (hp 2 ) r , or p 2 h f +h(p f 2 — p\) — 0, which can be readily integrated 
to give 


*x 


h(x) 


exp 


P\{t) 


dt 




P2 ' Pl(f) 

We have just proved the following: 

13.5.4. Theorem* The SOLDE of Equation (13.18) is self-adjoint if and only if 
P 2 = P\> which case the DE has the form 


d 

dx 


卜 2 CO 尝 ] + Po(x)y = 0. 


If it is not self-adjoint, it can be made so by multiplying it through by 

mx Pi(f) 


h(x) 


Pi 


exp 


P2(t) 


dt 


13.6 POWER-SERIES SOLUTIONS OF SOLDES 367 


13.5,5. Example, (a) The Legendre equation in normal form, 

,, 2x . X _ 

is not self-adjoint. However, we get a self-adjoint version if we multiply through by h(x )= 
l-x 2 : 


(1 一 x 2 )y ff - 2xy f + ^ = 0, 
or [(1 — x 2 )y f Y + = 0. 

(b) Similarly, the normal form of the Bessel equation 

: (1 - g)；y =0 


is not self-adjoint, but multiplying through by h(x) =： x yields 


±( x ^l 

dx V dx 




which is clearly self-adjoint. 



13.6 Power-Series Solutions of SOLDEs 


Analysis is one of the richest branches of mathematics, focusing on the endless 
variety of objects we call functions. The simplest kind of function is a polyno¬ 
mial, which is obtained by performing the simple algebraic operations of addition 
and multiplication on the independent variable x. The next in complexity are the 
trigonometric functions, which are obtained by taking ratios of geometric objects. 
If we demand a simplistic, intuitive approach to functions, the list ends there* It 
was only with the advent of derivatives, integrals, and differential equations that a 
vastly rich variety of functions exploded into existence in the eighteenth and nine¬ 
teenth centuries. For instance, nonexistent before the invention of calculus, can 
be thought of as the function that solves dy/dx = y. 

Although the definition of a function in terms of DEs and integrals seems a bit 
artificial, for most applications it is the only way to define a function. For instance, 
the error function, used in statistics, is defined as 

1 r x 2 

erf(jc) = —= I e~ r dt, 

V ^ J —00 


Such a function cannot be expressed in terms of elementary functions. Similarly, 
functions (of x) such as 



smr 


dt, 



yj\ — x 2 sin 2 1 dt, 



dt 

x 1 sin 2 1 




368 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS 


and so on are encountered frequently in applications. None of these functions can 
be expressed in terms of other well-known functions. 

An effective way of studying such functions is to study the differential equa- 
tions they satisfy. In fact, the majority of functions encountered in mathematical 
physics obey the HSOLDE of Equation (13.18) in which the pt (x) are elementary 
functions, mostly ratios of polynomials (of degree at most 2). Of course, to specify 
functions completely, appropriate boundary conditions are necessary. For instance, 
the error function mentioned above satisfies the HSOLDE , + 2xy f = 0 with the 
boundary conditions : y(0) = 士 and 〆(()）= 

The natural tendency to resist the idea of a function as a solution of a SOLDE 
is mostly due to the abstract nature of differential equations. After all, it is easier to 
imagine constructing functions by simple multiplications or with simple geometric 
figures that have been around for centuries. The following beautiful example (see 
[Birk78, pp. 85-87]) should overcome this resistance and convince the skeptic 
that differential equations contain all the information about a function. 

13.6.1. Example. We can show that the solutions to 夕 "+ 夕 = 0 have all the properties 
we expect of smx and cosx. Let us denote the two linearly independent solutions of this 
equation by C(x)and 5(jc).To specify these functions completely, we set C (0) = 5’(0) = 1 ， 
and C^O) — 5(0) =： 0. We claim that this information is enough to identify C(x) and S(x) 
as cos x and sinx, respectively. 

First, let us show that the solutions exist and are well-behaved functions. With C(0) 
and C r (0) given, the equation y f, y = 0 can generate all derivatives of C(x) at zero: 
C 〃 (0) = -C(0) = -1, C' 〃 (0) = -C’ ⑼ = 0, C( 4 \0) = -C"(0) = +1, and, in general, 


Example illustrates 
that all information 
about sine and cosine 
is hidden in their 
differential equation 


C (n) (0) 


0 if n is odd, 

(—1 户 ifw = 2* where/: = 0, 1,2,.... 


Thus, the Taylor expansion of C(x) is 


oo 


C(x) = 公 - 1 ) 

k^O 


k x 


2k 


(2k)\ 


Similarly, 


oo 


S(x) = [(-I) 
k=0 


k x 


2M 


(2fe+l)! 

A simple ratio test on the series representation of C{x) yields 


lim 




k-^oo cik 


lim 

k—oo 


(-1 户 +1 x 2 ( 奸 1 >/(2it + 2)! 
(~l) k x 2k /(2k)\ 


(13.25) 


(13.26) 


k—oo (2fc + 2)(2*+l) 


0, 


which shows that the series for C(x) converges for all values of jc. Similarly, the series for 
S(x) is also convergent. Thus, we are dealing with well-defined finite-valued functions. 
Let us now enumerate and prove some properties of C(x) and 5(x). 

(a) C\x) = -S(x). 

We prove this relation by differentiating C ff {x) + C(x) = 0 and writing the result as 


13.6 POWER-SERIES SOLUTIONS OF SOLOES 369 


[C’(x )]〃 + C f {x) = 0 to make evident the fact that C f (x) is also a solution. Since C’(0) = 0 
and [C / (0)] / = C 〃 (0) = —1，and since —5(^:) satisfies the same initial conditions, the 
uniqueness theorem implies that C f (x) = —5(x). Similarly, S r (x) = C(x). 

(b) C 2 (x)-\- S 2 (x) = 1. 

Since the p(x) term is absent from the SOLDE, Proposition 13.4.2 implies that the Wron- 
skian of C(^:) and 5(^) is constant. On the other hand, 

W(C, S; x) = C(x)S\x) - C / U)5(jc) = C 2 (x) + S 2 (x) 

= W(C, S\ 0) = C 2 ⑼ + S 2 (0) = L 

(c) S(a-\-x) = S(a)C(x) -h C(a)S(x), 

The use of the chain rule easily shows that + x) is a solution of the equation y ff -\-y = 
0. Thus, it can be written as a linear combination of C(x) and 5(;c) [which are linearly 
independent because their Wronskian is nonzero by (b)]: 

S(a + 又） = AS(x) + BC(x), (13.27) 

This is a functional identity, which for jc = 0 gives S(a) = BC(0) = 5. If we differentiate 
both sides of Equation (13.27)，we get 

C(a+x) = AS\x) + BC\x) = AC{x) - BS(x), 

which for x = 0 gives C(a) = A. Substituting the values of A and B in Equation (13.27) 
yields the desired identity. A similar argument leads to 

C{a + x) = C(d)C{x) - S(a)S(x). 


⑹ Periodicity of C{x) and 邮 ). 

Let ;co be the smallest positive real number such that 5 (xq) = C(xq). Then property (b) 
implies that C(xq) = 5(xq) = 1 /y/2. On the other hand, 

5( 文 0 +x) = 5(xo)C(-c) + C(xq)S(x) = C^o)CW + 5(^o)5(x) 

= C(xq)C(x) — S(xq)S(—x) = C (xq — x). 

The third equality follows because by Equation (13.26) ， S(x) is an odd function of jc. This 
is true for all x; in particular, for x =xq it yields 5(2^o) = C(0) = 1, and by property (b), 
C(2xq) = 0. Using property (c) once more, we get 


5(2xq +x) = S(2xo) C (x) + C(2xo)5(^:) = C(^r), 

C(2x 0 +x) = C(2x 0 )C(x) - S(2x 0 )S(x) = -S(x). 

Substituting x = 2xq yields 5(4j ： o) = C(2xq) = 0 and C(4xq) — — 5 ( 2 x 0 ) = — 1 . 
Continuing in this manner, we can easily obtain 

5(8xq + 文） =SCO ， C(8a：o -\-x) = C(x), 


which prove the periodicity of 5(x) and C(x) and show that their period is 8a ： q. It is 
possible to determine xq. This determination is left as a problem, but the result is 


even 



dt 


A numerical calculation will show that this is tt/4. 


M 



370 13. SECOND-ORDER LIWEAR DIFFERENTIAL EQUATIONS 


13.6.1 Frobenius Method of Undetermined Coefficients 

A proper treatment of SOLDEs requires the medium of complex analysis and will 
be undertaken in the next chapter. At this point, however, we are seeking a formal 
infinite series solution to the SOLDE 

y f， + p{x)y f + q{x)y 

where p{x) andq{x) are real and analytic. This means that p(x) and q(x) can be 
represented by convergent power series in some interval (a, b). [The interesting 
case where p(x) and q(x) may have singularities will be treated in the context of 
complex solutions.] 

The general procedure is to write the expansions 6 

OO OQ OQ 

P(x) = y^akx k , q(x) = y^ j bj c x k , y = ^CkX k (13.28) 

k=Q k=0 k=0 

for the coefficient functions p and q and the solution y, substitute them in the 
SOLDE, and equate the coefficient of each power of x to zero. For this purpose, 
we need expansions for derivatives of y: 

00 00 

3 / = ^2kc k x k - 1 1 )ca ； + i^, 

k=\ k=0 

OQ OQ 

f (々 + l)kc M x k ~ l = ^(^ + 2)(/: + \)c w x k . 
k=l k=0 

Thus 

00 00 __ 

p(x)y f = E m a mX m 、k + l)c k ^\x k = l)a m c k ^\x kJtm . 

k=0 m=0 k，m 

Let k-\-m = n and sum over n. Then the other sum, say m 9 cannot exceed n. Thus, 

00 n 

p(x)y ， = ^2 l)a m c n ^ m+ ix n . 

n=0 m=0 


Similarly, q{x)y =. ESo Ylm=o bm c n~mX n ^ Substituting these sums and the se¬ 
ries for y ,r in the SOLDE, we obtain 

00 ( n 1 

1 (n -f l)(n + 2)c n+2 + E [(n - m + \)a m c n - m ^\ + b m c n - m \ x n =0. 

n=0 I m=0 

6 Here we are expanding about the origin. If such an expansion is impossible or inconvenient, one can expand about another 
point, say xq. One would then replace all powers of jc in all expressions below with powers ofx — xq. These expansions assume 
that p, q, and}? have no singularity at ^ = 0. In general, this assumption is not valid, and a different approach, in which the whole 
series is multiplied by a (not necessarily positive integer) power of 义 ， ought to be taken. Details are provided in Chapter 14'. 







13.6 POWER-SERIES SOLUTIONS OFSOLDES 371 


For this to be true for all x, the coefficient of each power of x must vanish: 

n 

{n + l)(n 4 - 2 )C n +2 = 一 一 w + bmCn - m] f° r w 

m=0 


or 

n — 1 

n(n + l)C/ 2 +l = — — fTl)a m Cn-m + 

m—0 


for n > 1. 

(13.29) 


If we know cq and c\ (for instance from boundary conditions), we can uniquely 
determine c n for n > 2 from Equation (13.29). This, in turn, gives a unique power- 
series expansion for j, and we have the following theorem. 

the SOLDE existence 13*6.2. Theorem, (the existence theorem) For any SOLDE of the form y f, 4- 
theorem pi^y 1 4 - q(x)y = 0 with analytic coefficient functions given by the first two 
equations of (13.28), there exists a unique power series，given by the third equa¬ 
tion of (13.28) that formally satisfies the SOLDE for each choice of co and c\. 

This theorem merely states the existence of a formal power series and says 
nothing about its convergence. The following example will demonstrate that con¬ 
vergence is not necessarily guaranteed. 

13.6.3* Example. The formal power-series solution ioix 2 y r —y-\-x = 0 can be obtained 
by letting y c n x n . Then / = E^=o( n + and substitution in the DE 

gives + l)c n+ \x tt ^ 2 - c ^ n + 文 = 0, o r 

OO 00 

+ l)C n +l<X n +2 — Co — C\X — C n X n + X = 0. 
n=0 rt = 2 

We see that c 0 = 0, ci = 1, and (n + l)c n+i = c n+2 n > 0. Thus, we have the 
recursion relation nc n - q+i for n > 1 whose unique solution is c n = (n - 1)!, which 
generates the following solution for the DE: 

y = x +x 2 (2!)x 3 + (3 !)j 4 + ■•■ + (« — l)!x n + …. 

This series is not convergent for any nonzero jc. 圈 


As we shall see later, for normal SOLDEs, the power series of ;y in Equation 
(13.28) converges to an analytic function. The SOLDE solved in the preceding 
example is not normal. 

13.6.4. Example. As an application of Theorem 13.6.2, let us consider the Legendre 
equation in its normal form 


2x 


X 


y 


X ‘ 


y 


X A 


y 


o. 


372 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS 


For |^| < 1 both p andq are analytic, and 


OO 


00 


pCO = (x 2 ) m = J](-2)^ 2w+1 , 




m ： 


00 


OO 


q(x)=kJ2^ 2 ) m = J2 Xx2m - 

m=0 m=0 

Thus, the coefficients of Equation (13.28) are 


am 


0 if m is even, 
—2 if wi is odd 


and 


bm = {o 1 


if m is even, 
if w is odd. 


We want to substitute for a m and b m in Equation (13.29) to find c M +i. It is convenient 
to consider two cases: when n is odd and when n is even. For n = 2r + 1， Equation 
(13.29) — after some algebra~yields 


r 

(2r + l)(2r + 2)c2 r -\-2 = (4r - 4m - l)c 2 ( r _ m ). 


(13.30) 




With r r + 1, this becomes 


r+1 

(2r + 3)(2r + 4)c2 r +4 = ^ (4r + 4 - 4m - 久 ) C2( r +l-m> 

w=G 

r+1 

=(4r + 4 — 入 ) c 2(r+1) + 乞 (4r + 4 - 4m - 入 ) C2( r +l—m) 

m=l 

r 

=(4r + 4 - A)c 2r+2 + D (4r - 4m - k)c 2 ( r - m ) 

m—Q 

=(4r +4 — 入 ) 仏 +2 + (2r + l)(2r 4 - 2)C2 r 十 2 
=[ 一人 + (2r + 3)(2r + 2)]C2 / *+ 2 ， 

where in going from the second equality to the third we changed the dummy index, and in 
going from the third equality to the fourth we used Equation (13.30). Now we Iet2r+2 = k 
to obtain (fc + l)(fe + 2) 卬 +2 = \Mk + 1)- 久 ] or 


Ck+2 


k{k + 1 ) — \ 
( 灸 +1)( 灸 +2) 


Ck 


for even k. 


It is not difficult to show that starting with n = 2r, the case of even n t we obtain this same 
equation for odd k. Thus, we can write 


^n+2 


n(n + 1) —入 
(« + l)(n + 2) 


c n- 


(13.31) 


For arbitrary cq and ， we obtain two independent solutions, one of which has only even 
powers of 工 and the other only odd powers. The generalized ratio test (see [Hass 99, Chapter 
5]) shows that the series is divergent for x = ±1 unless X = 1(1 + 1) for some positive 



13.6 POWER-SERIES SOLUTIONS OF SOLOES 373 


quantum harmonic 
oscillator: power 
series method 


integer L In that case the infinite series becomes a polynomial, the Legendre polynomial 
encountered in Chapter 7. 

Equation (13.31) could have been obtained by substituting Equation (13.28) directly 
into the Legendre equation. The roundabout way to (13.31) taken here shows the generality 
of Equation (13.29). Wth specific differential equations it is generally better to substitute 
(13.28) directly. 圈 

13.6.5. Example. We studied Hermite polynomials in Chapter 7in the context of classical 
orthogonal polynomials. Let us see how they arise in physics. 

The one-dimensional time-independent SchrGdinger equation for a particle of mass m 
in a potential V (x) is 


/l 2 d 2 \jr 
2m dx 2 


+ V(x)T/r = 


^mco 2 jc 2 and 


where E is the total energy of the particle. 

For a harmonic oscillator, V(x) = jkx 2, 

" m 2 co 2 2 2m 

f —— ^x 2 f + — = 0 . 

Substituting 伞 (x) = H(x) &xp(~mcox 2 /2h) and then making the change of variables 
x = (l/^/mco/h)y yields 


H ,r - 2yH r + 入丑 = 0 where k 


2E 

hco 


1. 


(13.32) 


This is the Hermite differential equation in normal form. We assume the expansion H (>?) 

c ny n which y ields 


oo 


00 


H\y) ^ ^ncny^ 1 = + l)c„ + iy n 


OQ 


OO 


H f \y) = + l)c rt+ i 3 ? w_1 = + l)(n + 2 )c„ + 2 y n . 

n—1 n=0 


Substituting in Equation (13.32) gives 


OO 


00 


+ 1)(« + 2)c„ +2 + 入 c n ]/ - 2 f(n + l)c n+ i/ +1 =0, 
«=0 n =0 


or 


OO 

2 c 2 + 入 co + y^[(^ + 2)(n + 3)c rt +3 + — 2(n + l)c 时 i];y 朴 = 0, 

n=0 

Setting the coefficients of powers of y equal to zero, we obtain 


C2 

c n +3 


co ： 


_k 

~2 
2(/i + 1) — 入 

(n 4 - 2)(« + 3) 


for n > 0, 


374 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS' 


or, replacing n with « — 1 , 

— 入 

(n + l)(n + 2 ) 


(13.33) 


The ratio test yields easily that the series is convergent for all values of y. 

Thus, the infinite series whose coefficients obey the recursive relation in Equa¬ 
tion (13.33) converges for all y. However, on physical grounds, i.e., the demand that 
lim^^oo = 0, the series must be truncated. This happens only if X = 21 for some 
integer l (see Problem 13.20 and [Hass 99, Chapter 13]), and in that case we obtain a 
polynomial, the Hermite polynomial of order l. A consequence of such a truncation is the 
quantization of harmonic oscillator energy: 


2/ = A.= 


2 五 

——1 =>• E = (lj)hco 


Two solutions are generated from Equation (13.33), one including only even powers 
and the other only odd powers. These are clearly linearly independent. Thus, knowledge of 
cq and c\ determines the general solution of the HSOLDE of (13.32). M 


quantum harmonic 
oscillator: algebraic 
method 


The preceding two examples show how certain special functions used in math¬ 
ematical physics are obtained man analytic way, by solving a differential equation. 
We saw in Chapter 12 how to obtain spherical harmonics and Legendre polynomi¬ 
als by algebraic methods. It is instructive to solve the harmonic oscillator problem 
using algebraic methods, as the following example demonstrates. 

13.6.6* Example, The Hamiltonian of a one-dimensional harmonic oscillator is 


H 




creation and 
annihilation 
operators 


where p = -ihd/dx is the momentum operator. Letus find the eigenvectors and eigenvalues 
ofH. 

We define the operators 



P 

w2mhco 


and 



P 

y/2mh(s) 


Using the commutation relation [a;, p] = ^1, we can show that 


[a, a 卞 ] = 1 and H = fkoa^a + jhco'l. 


(13.34) 


Furthermore, one can readily show that 


[H, a] = -tkoA, = ^ua f . (13.35) 

Let \^e) be the eigenvector corresponding to the eigenvalue E\ H |^£-) = E and 
note that Equation (13.35) gives Ha|i/r £ > = ( a H - hcoa) |^ £ ) - (E - hco)a \f E ) and 
Hat \ f E ) = (£ 4 - &y)a 卞 I 少五〉 . Thus, a | 少石 } is an eigenvector of H, with eigenvalue E — fko t 
and 1 少 £) is an eigenvector with eigenvalue E-hHw. That is why at and a are called the 
raising and lowering (or creation and annihilation) operators, respectively. We can write 


a \f E ) = C E \fE-rm>) 


13,6 POWER-SERIES SOLUTIONS OFSOLDES 375 


quantum harmonic 
oscillator: connection 
between algebraic 
and analytic methods 


By applying a repeatedly, we obtain states of lower and lower energies. But there is a 
limit to this because H is a positive operator: It cannot have a negative eigenvalue. Thus, 
there must exist a ground state, | 少 o>，such that a | ^o) = 0. The energy of this ground state 
(or the eigenvalue corresponding to \^o)) can be obtained : 7 

H I^tq) = + \hoy) |^*o> — - 

Repeated application of the raising operator yields both higher-level states and eigenvalues. 
We thus define \ilf n ) by 

(a t ) n \if 0 ) = Cn \fn) , (1336) 

where c n is a normalizing constant. The energy of \\jr n ) is n units higher than the ground 
state’s，or 


E n = {n + ^)ha), 

which is what we obtained in the preceding example. 

To find ， we demand orthonormality for the Taking the inner product of (13.36) 

with itself, we can show (see Problem 13.21) that \c n \ 2 = n\c n -\\ 2 , or \c n \ 2 = «!|col 2 , 
which for [cqI = 1 and real c n yields c n = \/n!. It follows, then, that 


\fn) 


yfn\ 


(aV I 少 o> • 


In terms of functions and derivative operators, a | 如 > = 0 gives 


h d 


X 


少 0 ⑻ = o 


2h " y 2mco dx 
with the solution i/Q(x) = cexpi—mcox^/2h). Normalizing iro(x) gives 

1 = { 如 I 少 0 > =c 2 
Thus, 


，0 ° / mcox 2 \ . 9 \ 


exp 


-oo 


h 


) 


dx 




^ oW = (^) 1/4 ,W/(2.) > 

We can now write Equation (13.37) in terms of differential operators: 
1 /mco\V^ 




h d 


x 


t —ma)X 2 /(2h) 


而 \ \ hir J \y 2h " V 2mco dx 
Defining a new variable y = ^/mco/hx transforms this equation into 


( y ^±) n e -y 2 /2 

\fm) V 2 n «! 、 dy) 


^ = (^) l/4 1 


(13.37) 


7 From here on, the unit operator 1 will not be shown explicitly. 


376 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS 


From this, the relation between Hermite polynomials, and the solutions of the one- 
dimensional harmonic oscillator as given in the previous example, we can obtain a general 
formula for H n (^:). In particular, if we note that (see Problem 13.21) 




-y 


2 


and, in general. 


/2 


d 


y 


2 d il 

= (-lf e y 2 ^-e 


dy / N dy n 

we recover the generalized Rodriguez formula of Chapter 7. 



To end this section, we simply quote the following important theorem (for a 
proof, see [Birk 78, p. 95]): 

13.6.7. Theorem. For any choice ofco and c\, the radius of convergence of any 
power series solution y = Ckx k for the normal HSOLDE 


whose coefficients satisfy the recursion relation of (13.29) is at least as large as 
the smaller of the two radii of convergence of the two series for p(x) and q(x). 

In particular, if p(x) and q(x) are analytic in an interval around : c = 0, then 
the solution of the normal HSOLDE is also analytic in a neighborhood of jc = 0. 

13.7 SOLDEs with Constant Coefficients 

The solution to a SOLDE with constant coefficients can always be found in closed 
form. In fact, we can treat an nth-order linear differential equation (NOLDE) with 
constant coefficients with no extra effort. This brief section outlines the procedure 
for solving such an equation. For details, the reader is referred to any elementary 
book on differential equations (see also [Hass 99]). The most general «th-order 
linear differential equation (NOLDE) with constant coefficients can be written as 

L[j] = y^ n) + a n -\y (n ~^ + • ■ • + ai ： y’ + aoy = r(x), (13.38) 

The corresponding homogeneous NOLDE (HNOLDE) is obtained by setting 
r(x) = 0. Let us consider such a homogeneous case first. The solution to the 
homogeneous NOLDE 

L [>] 三 j ⑻ + a n ^iy (n ~ l) + … + ai / + 卯 y = 0 (13.39) 

can be found by making the exponential substitution y = which results in the 
equation L[e Xx ] = (V 1 -h a n -\X n ~ l + • • • + aiA + ao)e Xx = 0. This equation will 

characteristic hold only if 入 is a root of the characteristic polynomial 
polynomial of an 
HNOLDE 




13.7 SOLDES WITH CONSTANT COEFFICIENTS 377 


/?( 入） = 入 ,1 + Ct n —\X n ^ + • ■ • + Q\X + flQ> 


which, by the fundamental theorem of algebra, can be written as 

p(k) =( 人 — h) kl (k- 'kif 2 …( 久 - 人 m ) km (13.40) 

The Xj are the distinct (complex) roots of p(k) and have multiplicity^. 

13«7_1. Theorem. Let Uj}T=i roots of the characteristic polynomial of the 

real HNOLDE of Equation (13.39), and let the respective roots have multiplicities 
{^}^ =1 . Then the functions 


{e^ x ,xe k J x ,...,x k J- l e^ x }% x 

area basis of solutions of Equation (13.39). 

When a 入 is complex, one can write its corresponding solution in terms of 
trigonometric functions. 

13.7.2* Example. An equation that is used in both mechanics and circuit theory is 

+a— -\-by = 0 for a,b > 0. (13.41) 

dt 2 dt 

Its characteristic polynomial is p(X) = l 2 + aX + 办 ， which has the roots 

入 1 = -(—a + yf— 4b) and 入 2 = 去( _<3 - — 4 办 ). 

We can distinguish three different possible motions depending on the relative sizes of 
a andb. 

(a) a 2 > 4b (overdamped): Here we have two distinct simple roots. The multiplicities are 
both one (it 1 =k 2 = 1); therefore, the power of x for both solutions is zero (ri = r 2 = 0). 
Let y = 去 \/a 2 — 4b. Then the most general solution is 

}?(?) = e~ at ^{c\e yt + C 2 e~ yt ). 

Since a > 2y, this solution starts at y = ci 4- ^2 at f = 0 and continuously decreases; so, 
as r — 00 , y(t) —>■ 0. 

(b) a 2, = 4-b (critically damped): In this case we have one multiple root of order 2 (k\ = 2); 
therefore, the power of x can be zero or 1 (ri = 0,1). Thus, the general solution is 

y(?) = c\te~ at ^ + c^e~ at ^. 

This solution starts at y(0) = cq a.t t = 0, reaches a maximum (or minimum) at t = 
2 /a — co/ci，and subsequently decays (grows) exponentially to zero. 

(c) a 2 < 4b (underdamped): Once more, we have two distinct simple roots. The multi¬ 
plicities are both one {k\ = ^2 = 1 )； therefore, the power of x for both solutions is zero 


378 13. SECOND-ORDER LINEAR DIFFEREWTIAL EQUATIONS 


(ri =，2 = 0)_ Let 0 ) = \yl^b — a 2 . Then k\ —a/2 + ico and I 2 = The roots are 

complex, and the most general solution is thus of the form 

y(t) = e~ at ^{c\ cos cot H- C 2 sinwf) = Ae~ at ^ cos (如 + a). 


The solution is a harmonic variation with a decaying amplitude A Gxp(—at/2). Note that if 
a = 0, the amplitude does not decay. That is why a is called the damping factor (or the 
damping constant). 

These equations describe either a mechanical system oscillating (with no external force) 
in a viscous (dissipative) fluid, or an electrical circuit consisting of a resistance R, an 
inductance L, and a capacitance C. For RLC circuits, a = R/L and b = 1/(LC). Thus, 
the damping factor depends on the relative magnitudes of R and L. On the other hand, the 
frequency 



depends on all three elements. In particular, for R > 2-s/L/C the circuit does not oscil¬ 
late. 圈 


A physical system whose behavior in the absence of a driving force is described 
by a NOLDE will obey an inhomogeneous NOLDE in the presence of the driving 
force. This driving force is simply the inhomogeneous term of the NOLDE. The 
best way to solve such an inhomogeneous NOLDE in its most general form is by 
using Fourier transforms and Green’s functions, as we will do in Chapter 20. For 
the particular, but important, case in which the inhomogeneous term is a product 
of polynomials and exponentials, the solution can be found in closed form. 

13*7.3. Theorem. The NOLDE L[y] = e Xx S(x), where S(x) is a polynomial，has 
the particular solution e kx q{x) t where q{x) is also a polynomial. The degree of 
q(x) equals that of S{x) unless X = kj y a root of the characteristic polynomial of 
L, in which case the degree ofqix) exceeds that ofS(x) by kj, the multiplicity of 
入, 

Once we know the form of the particular solution of the NOLDE, we can find 
the coefficients in the polynomial of the solution by substituting in the NOLDE 
and matching the powers on both sides. 

13.7.4. Example. Let us find the most general solutions for the following two differential 
equations subject to the boundary conditions 夕 (0) 0 and /(0) = 1. 

(a) The first DE we want to consider is 

y ff -\-y=xe x . (13.42) 

The characteristic polynomial is 久 2 + 1， whose roots are 入 1 = i and X 2 = —Thus, a basis 
of solutions is {cos a:, sinjc). To find the particular solution we note that X (the coefficient 
of 文 in the exponential part of the inhomogeneous term) is 1, which is neither of the roots 
入 1 and 入 2 . Thus, the particular solution is of the form q(x)e x , where q(x) = Ax -\- B \s 




13.7 SOLDES WITH CONSTANT COEFFICIENTS 379 


of degree 1 [same degree as that of iS(;c) = x]. We now substitute u = (Ax -f B)e x in 
Equation (13.42) to obtain the relation 


2Axe x + (2A + 2B)〆=xe' 

Matching the coefficients, we have 

2A = 1 and 2A H- 2B = 0 A = \ = —B. 

Thus, the most general solution is 

T Y 

y = cosx + C 2 sinx + — ^)e . 

Imposing the given boundary conditions yields 0 = y(0) = c\ — ^ and 1 = : y’(0) = C 2 . 
Thus, 

y = j cosx + sin a: + 士 (x — l)e x 

is the unique solution. 

(b) The next DE we want to consider is 

/ = (13.43) 


Here ( 久 ） = — 1, and the roots are = 1 and 入 2 = A basis of solutions is 
{e x , e~ x }. To find a particular solution, we note that 5(x) = x and A. = 1 = Theorem 
13.7.3 then implies that^(x) must be of degree 2, because X\ is a simple root, i.e” k\ = 1. 
We therefore try 

q(x) = Ax^ + fix + C 斗 u = (Ax 2 Bx C)e x . 

Taking the derivatives and substituting in Equation (13.43) yields two equations, 

4A = 1 and A + 5 — 0, 

whose solution is A = -5 = |. Note that C is not determined, because Ce x is a solution 
of the homogeneous DE corresponding to Equation (13.43), so when L is applied to u, it 
eliminates the term C^. Another way of looking at the situation is to note that the most 
general solution to (13.43) is of the form 

y = c\e x 4- cje~ x + + C)e x . 

The term Ce^ could be absorbed in c\e x . We therefore set C = 0, apply the boundary 
conditions, and find the unique solution 

y = I sinhx + 5 (x 2 — x)e x . ■ 


380 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS 


13.8 The WKB Method 


In this section, we treat the somewhat specialized method — due to Wentzel, 
Kramers, and Brillouin~of obtaining an approximate solution to a particular type 
of second-order DE arising from the Schrodinger equation in one dimension. Sup¬ 
pose we are interested in finding approximate solutions of the DE 



-\-q(x)y — 0 


(13.44) 


in which q varies “slowly” with respect to a ： in the sense discussed below. If q 
varies infinitely slowly, i.e., if it is a constant, the solution to Equation (13.44) 
is simply an imaginary exponential (or trigonometric). So, let us define <j)(x) by 
y = e l 你 、 and rewrite the DE as 


(00 2 + i(p f ， -q=0. 


(13.45) 


Assuming that 0 〃 is small (compared to q\ so that y does not oscillate too rapidly, 
we can find an approximate solution to the DE: 


0 ’ = ^ = ± J yfq{x) dx. (13.46) 

The condition of validity of our assumption is obtained by differentiating (13.46): 


W f \ ^ 


2 


q' 




«\ql 


It follows from Equation (13.46) and the definition of ^ that 1 /^/qv& approximately 
1/(2^) times one “wavelength” of the solution y. Therefore, the approximation is 
valid if the change in 《 in one wavelength is small compared to \q\. 

The approximation can be improved by inserting the derivative of (13.46) in 
the DE and solving for a new (j>: 


or 

0 、 ±A(i±^) , =±V^(i 士成） 

■ / n ■ 

=±^/q + l -~ ^ (p(x) ^ =*= y 

The two choices give rise to two different solutions, a linear combination of which 
gives the most general solution. Thus, 


y ^ 




ci exp 


i j ^/q dx 


+ C 2 exp 


—i I y/q dx 


(13.47) 





13.8 THE WKB METHOD 381 


Equation (13.47) gives an approximate solution to (13.44) in any region in 
which the condition of validity holds. The method fails if q changes too rapidly or 
if it is zero at some point of the region. The latter is a serious difficulty, since we 
often wish to join a solution in a region in which q(x) > 0 to one in a region in 
which ^(x) < 0. There is a general procedure for deriving the so-called 
formulas relating the constants c\ and C 2 of the two solutions on either side of the 
point where q(x) = 0. We shall not go into the details of such a derivation, as it is 
not particularly illuminating. 8 We simply quote a particular result that is useful in 
applications. 

Suppose that q passes through zero at xq, is positive to the right of xo, and 
satisfies the condition of validity in regions both to the right and to the left of xo. 
Furthermore, assume that the solution of the DE decreases exponentially to the 
left of xq. Under such conditions, the solution to the left will be of the form 


1 


^q(x) 


exp 


*^o 


y/—q(x) dx 


while to the right, we have 


2 


1 


\/q(x) 


y/q{x) dx — 


^XQ 


丌 

4 


(13.48) 


(13.49) 


A similar procedure gives connection formulas for the case where q is positive on 
the left and negative on the right of jco- 

13.8.1. Example. Consider the Schrodinger equation in one dimension 
+ ~^[E - V{x)]f = 0 

where V (x) is a potential well meeting the horizontal line of constant E atx = a and x = b ， 
so that 


2m [> 0 


if a < x <b, 
if x < a or x > b. 


The solution that is bounded to the left of a must be exponentially decaying. Therefore, in 
the interval (a, b) the approximate solution, as given by Equation (13.49), is 


"(^T^ C0S (f dX -i) f 

where A is some arbitrary constant. The solution that is bounded to the right of b must also 
be exponentially decaying. Hence, the solution for a < x < Z? is 


\jr(x) ^ 


B 


(E- V) 1 / 4 



^[E-V(x)]dx-^ 


The interested reader is referred to the book by Mathews and Walker, pp. 27-37. 


382 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS 


Since these two expressions give the same function in the same region, they must be equal. 
Thus, A = B, and, more importantly, 



This is essentially the Bohr-Sommerfeld quantization condition of pre-1925 quantum me¬ 
chanics. M 


13.8.1 Classical Limit of the Schrodinger Equation 

As long as we are approximating solutions of second-order DEs that arise naturally 
from the Schrodinger equation, it is instructive to look at another approximation 
to the Schrodinger equation, its classical limit in which the Planck constant goes 
to zero. 

The idea is to note that since i/r (r, t) is a complex function, one can write it as 

= A(r, Oexp r)l, (13.50) 

in 」 

where A(r, t) and S(r z t) are real-valued functions. Substituting (13.50) in the 
Schrodinger equation and separating the real and the imaginary parts yields 

dS V5-V5 T7 h 2 V 2 A 

—_l_ - -j- = - . 

dt 2m 2m A 

dA A 0 

m~ + V5 - VA + = 0. (13.51) 

at 2 

These two equations are completely equivalent to the Schrodinger equation. 
The second equation has a direct physical interpretation. Define 

/?(r, 0 = A 2 (r, 0 = \ f(r, t)\ 2 and J(r, t) = A 2 (r, t) — = pv, 

^ (13.52) 

=v 

multiply the second equation in (13.51) by 2A/m, and note that it then can be 
written as 

dp 

—+V.J = 0, (13.53) 


which is the continuity equation for probability. The fact that J is indeed the 
probability current density is left for Problem 13.30. 








13.9 NUMERICAL SOLUTIONS OF DES 383 


The first equation of (13.51) gives an interesting result when ft — 0 because 
in this limit, the RHS of the equation will be zero, and we get 


35 1 2 w n 

- 1 —— mv z + V = 0. 

dt 2 

Taking the gradient of this equation, we obtain 
f—H-v-V^mv + W =0 ， 

which is the equation of motion of a classical fluid with velocity field v = V5/m. 
We thus have the following: 


Schrodinger equation 
describes a classical 
statistical mixture 
when 


13.8.2. Proposition. In the classical limit，the solution of the Schrodinger equa¬ 
tion describes a fluid (statistical mixture) of noninteracting classical particles of 
mass m subject to the potential V (r). The density and the current density of this 
fluid are, respectively，the probability density p = \ifr\ 2 and the probability current 
density J of the quantum particle. 


13.9 Numerical Solutions of DEs 

The majority of differential equations encountered in physics do not have known 
analytic solutions. One therefore resorts to numerical solutions. There area variety 
of methods having various degrees of simplicity of use and accuracy. This section 
considers a few representatives applicable to the solution of ODHs. We make 
frequent use of techniques developed in Section 2.6. Therefore, the reader is urged 
to consult that section as needed. 

Any normal differential equation of rath order, 

^ - = F(x, i ， …， x {n ^ X) \ t), 
at n 

can be reduced to a system of «first-order differential equations by defining x\ — x, 
X 2 —x,... x n = This gives the system 

X \ = X 2, X 2 = A ： 3 , • • ■ ， ^ n —\ = x n ， = _ _ _ ，知；广) • 

We restrict ourselves to a FODE of the form i = in which / is a well- 

behaved function of two real variables. At the end of the section, we briefly outline 
a technique for solving second-order DEs. 

Two general types of problems are encountered in applications. An initial 
value problem (IVP) gives x(t) at an initial time to and asks for the value of x at 
other times. The second type, the boundary value problem (BVP), applies only to 
differential equations of higher order than &st. A second-order BYP specifies the 
value ofx(0 and/or i(f) at one or more points and asks for x or x at other values 
of t. We shall consider only IVPs. 


384 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS 


13.9.1 Using the Backward Difference Operator 

Let us consider the IVP 


x = JC(ro)=io- 

The problem is to find {xk = x(/o 4- kh)}= =v given (13.54). 
Let us begin by integrating (13.54) between t n and t n + h; 


(13.54) 


x(t n +h) —x(t n ) 


x(t) dt. 


Changing the variable of integration tos = (t —t n )/h and using the shift operator 
E introduced in Section 2.6 yields 


x n -\-i — x n = h I x(t n sh) ds = h I [E 5 i:(f n )3 ds. 


(13.55) 


Since a typical situation involves calculating x n+ \ from the values of x(0 and 
at preceding steps, we want an expression in which the RHS of Equation 
(13.55) contains such preceding terms. This suggests expressing E in terms of the 
backward difference operator. It will also be useful to replace the lower limit of 
integration to —p ，where p is a number to be chosen later for convenience. Thus, 
Equation (13.55) becomes 


x n -\-\ — Xn—p + h 


(1 — W)~ S ds X n 


,, f 1 r(- s + i)ds / . 

知 -〆 办 厶 £ 丽 (- v ) 知 


^n-p + h 


(13.56) 


where 


(- 1 , f 1 r(-s^l)ds = 1 疒 

k \ J — n r( — S — fe 4" 1) k \ J — f } 


s(s 1) ■ - (s -h k — l)ds. 


(13.57) 


Keeping the first few terms for /? = 0, we obtain the useful formula 

, / V 5V 2 3V 3 251V 4 95V 5 

+ V 2 12 8 720 288 


(13.58) 


Due to the presence of V in Equation (13.58)，finding the value of x(t) at t n ^.\ 
requires a knowledge of x(r) and x(t) at points to, , t n . Because of this, 



13.9 NUMERICALSOLUTIONSOF DES 385 


formulas of open and 
closed type 


Equation (13.58) is called a formula of open type. In contrast, in formulas of 
closed type, the RHS contains values at 4+i as well. We can obtain a formula of 
closed type by changing B s x n to its equivalent form ， E s-1 x„_{-i. The result is 

OP 

b[ p) y k x n ^ (13.59) 

k=0 

where 

f 1 r(^ + 2)^ 

k = k\ J_ p ' 

Keeping the first few terms for 尸 = 0, we obtain 

/ V V 2 V 3 19V 4 3V 5 

—--- w - — 




(13.60) 


which involves evaluation at t n ^.\ on the RHS. 

For p =1(/7 = 3), Equation (13.56) [(13.59)] results in an expansion in 
powers of V in which the coefficient of (V p+2 ) is zero. Thus, retaining terms 
up to the (p — l)st [(p + l)st] power of V automatically gives us an accuracy of 
h p (h p ^ 2 ). This is the advantage of using nonzero values of p and the reason we 
considered such cases. The reason for the use of formulas of the closed type is the 
smallness of the error involved. All the formulas derived in this section involve 
powers of V operating on or . This means that to find ， we must know 
the values of Xkfork<n-\- L However, x = /(x, t) ox Xk — f(xk, h) implies 
that knowledge of Xk requires knowledge of Xk. Therefore, to find we must 
know not only the values of x but also the values ofat tk for k < n + 1. In 
particular, we cannot start with n = 0 because we would get negative indices for 
x due to the high powers of V. This means that the first few values of Xk must be 
obtained using a different method. One common method of starting the solution 
is to use a Taylor series expansion: 


fl 2 XQ 2 

Xk = x(?o H- kh) = xo + hxok H —— —H - ， 


where 


= f(xo, to), 


xq 


( 


dx 




XQ,tQ 


(13.61) 


For the general case, it is clear that the derivatives required for the RHS of Equation 
(13.61) involve very complicated expressions. The following example illustrates 
the procedure for a specific case. 



386 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS 


Euler's method 


Adam’s method 


13.9.1. Example. Let us solve the I VP x +x + e ( x 2 = 0 with :c(0) = 1. We can obtain 
a Taylor series expansion for x by noting that 

^0 — -^0 — ^0' ^0 = 戈 (0)= —文 0 一 2 义0爻0 — x o ^ 

0 = 一戈0 — 2 jCq 一 2 xo 无0 — 4文0戈0 — *^ o ' 

Continuing in this way, we can obtain derivatives of all orders. Substituting xq = 1 and 
keeping terms up to the fifth order, we obtain 


乂 0 = -2, 戈 o = 5 ， ^'o - 一 16, 


d 4 x 


= 65, 


<fix 

出 5 t=0 


=—326. 


Substituting these values in a Taylor series expansion with h = 0.1 yields 

x k = l - 0.2k + 0.025k 2 - 0.0027A: 3 + (2.7 x 10 _ 4 )/c 4 - (2.7 x 10 一 5 )P + • 


Thus, jq = 0.82254, X 2 = 0.68186, and J 3 = 0.56741. The corresponding values of jc can 
be calculated using the DE. We simply quote the result: x\ = —1.57026, 办 =—1.24973, 
XT) — — 1 . 00200 . 圔 

Once the starting values are obtained, either a formula of open type or one 
of closed type is used to find the next x value. Only formulas of open type will 
be discussed here. However, as mentioned earlier, the accuracy of closed-type 
formulas is better. The price one pays for having x n ^.\ on the RHS is that using 
closed-type formulas requires estimating x n ^\. This estimate is then substituted 
in the RHS, and an improved estimate is found. The process is continued until no 
further improvement in the estimate is achieved. 

The use of open-type formulas involves simple substitution of the known quan¬ 
tities xo, xj,... ,x n on the RHS to obtain x«+i，The master equation (for p = 0) 
is (13.58). The number of powers of V that are retained gives rise to different 
methods. For instance, when no power is retained, the method is called Euler’s 
method, for which we use ^ x n + hx n , A more commonly used method 
is Adam’s method, for which all powers of V up to and including the third are 
retained. We then have 


Xn-\-l 。文 n + 办 (1 + 姜 V + 長 V 2 + |V 3 )x„, 
or, in terms of values ofx, 

h … 

x n -\-i % 戈 n + — (55x n — 59x n -\ 4 - 31x n -2 — (13.62) 

Recall that Xk = f(Xk ， 4 ). Thus, if we know the values x n , x n —u x n _ 2 , and x n - 3 , 
we can obtain . 

13.9.2. Example. Knowing and 戈 3 , we can use Equation (13.62) to calculate 

X 4 for Example 13.9.1: 

X 4 ^ —— (55jE ：3 — 59x2 — 9 々 j) = 0.47793. 








13.9 NUMERICAL SOLUTIONS OF DES 387 


Runge-Kutta method 


With X 4 . at our disposal, we can evaluate X 4 = —x^ — x^e t4 f and substitute it in 
0.1 

X 5 + — (55i：4 — 59i：3 + 37x2 ~ 

to find JC 5 , and so on. 圔 


A crucial fact about such methods is that every value obtained is in error by 
some amount, and using such values to obtain new values propagates the error. 
Thus, error can accumulate rapidly and make approximations worse at each step. 
Discussion of error propagation and error analysis_topics that we have not, and 
shall not, cover — is common in the literature (see, for example, [Hild 87, pp. 267- 
268]). 

13.9.2 The Runge-Kutta Method 

The FODE of Equation (13.54) leads to a unique Taylor series, 


x(to + ft) = xo + hxo + —xo H - , 

where io, 戈 0 , and all the rest of the derivatives can be evaluated by differentiating 
x = f(x, t). Thus, theoretically, the Taylor series gives the solution (for ?o + 
but to + 2h，to + 3h, and so on can be obtained similarly). However, in practice, 
the Taylor series converges slowly, and the accuracy involved is not high. Thus 
one resorts to other methods of solution such as described earlier. 

Another method, known as the Runge-Kutta method, replaces the Taylor 
series 


• h 2 “ /i 3 … 

知 +1 ^ X n - hx n + ~^~^n ~f" + 


(13.63) 


with 


Xn+l =X n +h 


<xof(xn,t n ) + ^ajf(x n -\-bjh 9 t n -\-fijh) 


(13.64) 


where ao and {aj, bj, are constants chosen such that if the RHSof (13.64) 

were expanded in powers of the spacing h, the coefficients of a certain number of 
the leading terms would agree with the corresponding expansion coefficients of 
the RHS of (13.63). It is customary to express the b's as linear combinations of 
preceding values of /: 

*-i 

hbf —— 乂 iflcy' 1 — 1,2 ， »»»j p • 

r=0 


388 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS 


The k r are recursively defined as 

Icq = hf {pc n ， tji ) ， kf = hf -f- b r h, -j- fJLyh'). 

Then Equation (13.64) gives x n ^.\ = x n + J^r=o a rK- The (nontrivial) task now 
is to determine the parameters oc r , and 入 ". 


Carle David Tolme Runge (1856-1927), after returning from 
a six-month vacation in Italy, enrolled at the University of 
Munich to study literature. However, after six weeks of the 
course he changed to mathematics and physics. 

Runge attended courses with Max Planck, and they became 
close friends. In 1877 both went to Berlin, but Runge turned 
to pure mathematics after attending Weierstrass^ lectures. His 
doctoral dissertation (1880) dealt with differential geometry. 

After taking a secondary-school teachers certification test, 
he returned to Berlin, where he was influenced by Kronecker. 

Runge then worked on a procedure for the numerical solution 
of algebraic equations in which the roots were expressed as infinite series of rational func¬ 
tions of the coefficients. In the area of numerical analysis, he is credited with an efficient 
method of solving differential equations numerically, work he did with Martin Kutta. 

Runge published little at that stage, but after visiting Mittag-Leffler in Stockholm in 
September 1884 he produced a large number of papers in Mittag-Leffler^ journal Acta 
mathematical In 1886, Runge obtained a chair at Hanover and remained there for 18 years. 
Within a year Runge had moved away from pure mathematics to study the wavelengths 
of the spectral lines of elements other than hydrogen. He did a great deal of experimental 
work and published a great quantity of results, including a separation of the spectral lines 
of helium in two spectral series. 

In 1904 Klein persuaded Gottingen to offer Runge a chair of applied mathematics, a 
post that Runge held until he retired in 1925. 

Runge was always a fit and active man, and on his 70th birthday he entertained his 
grandchildren by doing handstands. However, a few months later he had a heart attack and 
died. 



In general, the determination of these constants is extremely tedious. Let us 
consider the very simple case where p = l 9 and let k = 入⑴ and fx = ji\. Then 
we obtain 

x n +\ = 知 + aofco + (13.65) 

where = hf(x n ,t n ) rnidk\ = hf{x n + h H-〆 々 )• 

Taylor-expanding k\, a function of two variables, gives 9 

/j3 

k\ = hf + + kffx) + + f^fxx) + 0(h 4 ), 

9 The symbol 0(h m ) means that all terms of order h m and higher have been neglected. 












13.9 NUMERICAL SOLUTIONS OF DES 389 


where f t = df/dt, etc. Substituting this in the first equation of (13.65), we get 


知 +i = x n -\-h(a 0 + ai)f -h 2 a\{iif t + kff x ) 

+ H- 2 Xflffxt 4 - 人 2 / 2 /xr) 0 (h ). 

On the other hand, with 

._ , " — df _df dx df — 

x = f, x = li = YtTt 七 t xfx + ft-ffx + fu 

7 = /r f 十 2ff xt + f 2 f xx + f x (ffx + f t ), 


(13.66) 


Equation (13.63) gives 

h 2 

= x n + hf + + ft) 

h 3 


jlfn + 2ff xt + f 2 f xx + f x (ff x + ft)] + 0{h\ 


(13.67) 


If we demand that (13.66) and (13.67) agree up to the /z 2 term (we cannot demand 
agreement for /i 3 or higher because of overspecification), then we must have a?o + 
ot\ = 1, a\/x — a\X — There are only three equations for four unknowns. 
Therefore, there will be an arbitrary parameter p in terms of which the unknowns 
can be written: 


o ：0 = 1 - 卢， o£\ = 




X 


Substituting these values in Equation (13.65) gives 

r 一 f hf h ' 

工 /i+l +/z |^(1 - P)f(x n ,t n ) + Pf + —i tn ~^ 2 ^ 


+ 0(h 3 ). 


This formula becomes useful if we let )3 = Then t n + hj (2 卢） =t n + h = t n +i, 
which makes evaluation of the second term in square brackets convenient. For 
^ = \, we have 


Xn-^i =x n -\- ^[f(x n ,t n )^ f{x n + hf, ^-h)] 4 - 0(h 3 ). 


(13.68) 


What is nice about this equation is that it needs no starting up! We can plug 
in the known quantities t n , / rt +i，and x n on the RHS and find x n+ \ starting with 
n = 0. However, the result is not very accurate, and we cannot make it any more 
accurate by demanding agreement for higher powers of h, because，as mentioned 
earlier, such a demand overspecifies the unknowns. 


w II c-nt in-a 's.tlsalssais ld _g lsin ^ 


390 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS 


Martin Wilhelm Kutta (1867-1944) lost his parents when he was still a child, and together 
with his brother went to his uncle in Breslau to go to gymnasium. He attended the University 
of Breslau from 1885-1890, and the University of Munich from 1891-1894 concentrating 
mainly on mathematics, but he was also interested in languages, music, and art. Although 
he completed the certification for teaching mathematics and physics in 1894, he did not start 
teaching immediately. Instead, he assisted von Dyck at the Technische Hochschule Mlinchen 
until 1897 (and then again from 1899 to 1903). 

From 1898 to 1899 he studied at Cambridge, and a year later, he finished his Ph.D. 
at the University of Munich. In 1902, he completed his habilitation in pure and applied 
mathematics at the Technische Hochschule Mlinchen, where he became professor of applied 
mathematics five years later. In 1909 he accepted an offer from University of Vienna, but 
a year later he went to the Technische Hochschule Aachen as a professor. From 1912 until 
his retirement in 1935 he worked at the Technische Hochschule Stuttgart. 

Kutta’s name is well known not only to physicists, applied mathematicians, and en¬ 
gineers, but also to specialists in aerospace science and fluid mechanics. The first group 
use the Runge-Kutta method, developed at the beginning of the twentieth century to obtain 
numerical solutions to ordinary differential equations. The second group use the Kutta- 
Zhukovskii formula for the theoretical description of the buoyancy of a body immersed in 
a nonturbulent moving fluid. Kutta’s work on the application of conformal mapping to the 
study of airplane wings was later applied to the flight of birds, and further developed by L. 
Prandtl in the theory of wings. 

Kutta obtained the motivation for his first scientific publication from Boltzmann and 
others (including a historian of mathematics) when working on the theoretical determination 
of the heat exchanged between two concentric cylinders kept at constant temperatures. By 
applying the conformal mapping technique, Kutta managed to obtain numerical values for 
the heat conductivity of air that agreed well with the experimental values of the time. 

Three of Kutta’s publications dealt with the history of mathematics, for which he profited 
greatly because of his knowledge of the Arabic language. 

One of the most important tasks of applied mathematics is to approximate numerically 
the initial value problem of ODEs whose solutions cannot be found in closed form. After 
Euler (1770) had already expressed the basic idea, Runge (1895) and Heun (1900) wrote down 
the appropriate formulas. Kutta’s contribution was to considerably increase the accuracy, 
and allow fora larger selection of the parameters involved. After accepting a professorship 
in Stuttgart in 1912, Kutta devoted all his time to teaching. He was very much in demand 
as a teacher, and it is said that his lectures were so good that even engineering students took 
an interest in mathematics. 

(Taken from W. Schulz, “Martin Wilhelm Kutta，” Neue Deutsche Biographie 13, Berlin, 
(1952-) 348-350.) 


Formulas that give more accurate results can be obtained by retaining terms 
beyond p = 1. Thus, for p = 2, if we write x n ^\ = x n -\- Y3=o a rk r ， there will be 
eight unknowns (three Qf 5 s, three 入 " ， s，and two 卩 ’s), and the demand for agreement 
between the Taylor expansion and the expansion of / up to /i 3 will yield only six 
equations. Therefore, there will be two arbitrary parameters whose specification 










13.9 NUMERICAL SOLUTIONS OF DES 391 


results in various formulas. The details of this kind of algebraic derivation are very 
messy, so we will merely consider two specific formulas. One such formula, due 
to Kutta, is 

= x n -I- ^(ko + 4fci + fc 2 ) + 0(h 4 ), (13.69) 

where 

= hf(x n ， t n )， k\ = hf(x n H- \k^,t n + \h) 9 
h = hf(x n -h 2k\ — ko, t n + h), 

A second formula, due to Heun, has the form 
知 +i = x n 4 - i(feo + 3A 2 ) + 0 (h 4 ), 
where 

*o = hf(x n , t n ), k\ = hf(x n + t n + \h), 

*2 = hf(x n + |fci -ko,t n + \h). 

These two formulas are of about the same order of accuracy. 

13.9.3. Example. Let us solve the DE of Example 13.9.1 using the Runge-Kutta method. 
With fo = 0, j：o = 1, h = 0.1, and n — 0, Equation (13.69) gives = —0.2, k\ = 
—0.17515, ^2 = —0.16476, so that 

: q = 1 + 1(-0.2 + 4(-0.17515) - 0.16476) = 0.82244. 

Thisxi ， h = 0 . 1 , and t\ : =tQ + h 0.1 yield the following new values: ko = —0.15700, 
k\ = —0.13870, k 2 = —0.13040, which in turn give 

x 2 = 0.82244 + 去 [—0.15700 — 4(0.13870) — 0.13040] = 0.68207. 

We similarly obtain 尤 3 = 0.56964 and 文 4 = 0.47858. On the other hand, solving the FODE 
analytically gives the exact result x(0 = e~ l j{\ + 0* 

Table 13.1 compares the values obtained here, those obtained using Adam’s method, 
and the exact values to five decimal places. It is clear that the Runge-Kutta method is more 
accurate than the methods discussed earlier. M 

The accuracy of the Runge-Kutta method and the fact that it requires no startup 
procedure make it one of the most popular methods for solving differential equa¬ 
tions. The Runge-Kutta method can be made more accurate by using higher values 
of p. For instance, a formula that is used for p = 3 is 

x n +i ^ x n 去 ( 灸 0 + 2 灸 1 + 2 灸 2 + 灸 3 ) + (13.70) 

where 

h = hf(x n ,t n ), k\ = hf(x n H- ^k 0 ,t n ^h), 

h = hf(x n + \k\, t n + \h) ks = hf(x n -h kj, t n + h). 



392 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS 


t 

Analytical 

Runge-Kutta 

Adam’s method 

0. 

1 

1 

1 

0.1 

0.82258 

0.82244 

0.82254 

0.2 

0.68228 

0.68207 

0.68186 

0.3 

0.56986 

0.56964 

0.56741 

0.4 

0.47880 

0.47858 

0.47793 


Table 13.1 Solutions to the differential equation of Example 13.9.1 obtained in three 
different ways. 


13.9.3 Higher-Order Equations 

Any nth-order differential equation is equivalent to n first-order differential equa¬ 
tions inn + 1 variables. Thus, for instance, the most general SODE ， F(x,x,x,t)= 
0, can be reduced to two FODEs by solving for x to obtain x = G(x,x, t), and 
defining x = m to get the system of equations 

u = G(m, x, t) f x = u. 


These two equations are completely equivalent to the original SODE. Thus, it is 
appropriate to discuss numerical solutions of systems of FODEs in several vari¬ 
ables. The discussion here will be limited to systems consisting of two equations. 
The generalization to several equations is not difficult. 

Consider the IVP of the following system of equations 

X = f{x ， U ， t\ x(tQ) = 耶； M = M, t), u(to) = MO* 

(13.71) 

Using an obvious generalization of Equation (13.70), we can write 


= X n -\- g (^o + 2fci + 2k2 + ks) + 0(h 5 ), 

W/H-i =u n - {- 去 (mo + 2mi + 2 m 2 + m3) + 0(A 5 ), 

(13.72) 


where 

k() -- hf Ufii 女 1 = + 泛灸 0， + 2^0> H - 5 办)， 

^•2 ~ (p^rt + 臺灸 1 ， + 乏肌 1， G + 2 办)， 
h — hf(x n + k 2 , U n + m2, t n + h). 


” 2 () = hg 、 X n ， U n ' tn ), lfl \ = hg(Xfi H - 士灸 0 , + 2 饥 0 , M + 2 办）， 

m2 = hg{x n + \k\,u n + + \h), 

m3 = hg(x n + fe, Un + m 2 ， t n +h). 


13.9 NUMERICAL SOLUTIONS OF DES 393 


These formulas are more general than needed for a SODE, since, as mentioned 
above, such a SODE is equivalent to the simpler system in which f(x, u, t) = u. 
Therefore, Equation (13.72) specializes to 

^0 = hu n = hx n , k\ — h(u n - {- ^mo) ― hx n 4 - ^hmo, 

— hx n H- ^hmi, 众 3 = hx n + hm 2 , 

and 

=x n -\- hx n + \h{rriQ + mi + m 2 ) + 0(h 5 ), 
x n+ \ + |(mo4- 2m\ 4 - 2m2 + ms) + 0(h 5 ), 

(13.73) 

where 

mo = hg{x n ,x n , t n ), mi = hg(x n + \hx n , x n + |m 0 , t n + ^h); 
m 2 = hg(x n 4 - \x n + \hn%Q, ± n + *m u t n + \h), 
m 3 = hg(x n + hx n + ^hm\,x n + m 2 , t n -\-h). 

13.9.4. Example. The rVP jc x = 0, ^(0) = 0, x ⑼ = 1 clearly has the analytic 
solution >:(?) = sin?. Nevertheless, let us use Equation (13.73) to illustrate the Runge- 
Kutta method and compare the result with the exact solution. 

For this problem g(x t x, t) = —x. Therefore, we can easily calculate the w’s: 

1 

mo = -hx n , mi = -h(x n + jhx n ), - 

W 2 = 一 h(x n + \hx n - \h 2 x n ), 
m = -h\x n + hx n - \h 2 {x n 4 - \hx n )\. 

These lead to the following expressions for x n ^.\ and x n ^.\\ 

%+l =x n + hx n - lh 2 (3x n + hx n - \h 2 x n ), 

Xn-^1 =x n - lh[6x n + ^hx n - h 2 {x n + \hx n )]. 

Starting with — 0 and xq = 1, we can generate jq ， ^ 2 , and so on by using the last 
two equations successively. The results for 10 values of x with h = 0.1 are given to five 
significant figures in Table 13.2. Note that up to ^5 there is complete agreement with the 
exact result. B 

The Runge-Kutta method lends itself readily to use in computer programs. 
Because Equation (13.73) does not require any startups, it can be used directly to 
generate solutions to any IVP involving a SODE. 

Another, more direct, method of solving higher-order differential equations is 
to substitute D = —{l/h) ln(1 — V) for the derivative operator in the differential 
equation, expand in terms of V, and keep an appropriate number of terms. Problem 
13.33 illustrates this point for a linear SODE. 



394 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS 


t 

Runge-Kutta 

situ 

0.1 

0.09983 

0.09983 

0.2 

0.19867 

0.19867 

0.3 

0.29552 

0.29552 

0.4 

0.38942 

0.38942 

0.5 

0.47943 

0.47943 

0.6 

0.56466 

0.56464 

0.7 

0.64425 

0.64422 

0.8 

0.71741 

0.71736 

0.9 

0.78342 

0.78333 

1.0 

0.84161 

0.84147 


Table 13.2 Comparison of the Runge-Kutta and exact solutions to the second order DE 
of Example 13.9.4. 


13,10 Problems 

13.1, Let u(x) be a differentiable function satisfying the differential inequal¬ 
ity u f (x) < Ku(x) for x € [a, b], where AT is a constant. Show that u(x) < 
u(a)e K ^ x ~ a \ Hint: Multiply both sides of the inequality by e~ Kx , and show that 
the result can be written as the derivative of a nonincreasing function. Then use 
the fact that a <x to get the final result. 

13.2, Prove Proposition 13.4.2. 

13.3, Let/ and ^ be two differentiable functions that are linearly dependent. Show 
that their Wronskian vanishes. 

13A Show that if (/i, /{) and (fi, f^) are linearly dependent at one point, then 
fi and /2 are linearly dependent at all x e [a, b]. Here f\ and /2 are solutions of 
the DE of (13.12). Hint: Derive the identity 

W(/i, /2 ； x 2 ) = W(/i,/2 ； ^i)exp - f p(t)dt • 

13.5. Show that the solutions to the SOLDE y ,f + q(x)y = 0 have a constant 
Wronskian. 

13.6. Find (in terms of an integral) G n (x), the linearly independent “partner” of 
the Hermite polynomial H n (x). Specialize this to n = 0, 1. Is it possible to find 
Gq(x) and G\ (x) in terms of elementary functions? 

13.7. Let /i ， /2, and /3 be any three solutions of y ,f + py f -\-qy = 0. Show that the 
(generalized 3x3) Wronskian of these solutions is zero. Thus, any three solutions 
of the HSOLDE are linearly dependent. 




13.10 PROBLEMS 3 明 


13.8. For the HSOLDE y n + py f + 町 = 0, show that 

屋 - m " - a 

W(/i, h) H W(f u f 2 ) 

Thus, knowing two solutions of an HSOLDE allows us to reconstruct the DE. 

13.9. Let /i ， /2, and /3 be three solutions of the third-order linear differential 
equation y m + P2{x)y f, 4- pi(x)y f po(x)y = 0. Derive a FODE satisfied by the 
(generalized 3x3) Wronskian of these solutions. 

13.10. Prove Corollary 13.4.13. Hint: Consider the solution u — 1 of the DE 
u ,r = 0 and apply Theorem 13.4.11. 

13.11. Show that the adjoint of M given in Equation (13.21) is the original L, 

13.12. Show that if u{x) and v (x) are solutions of the self-adjoint DE (puy+qu — 
0, then Abel’s identity ， p(uv f - vu f ) = constant, holds. 

13.13. Reduce each DE to self-adjoint form. 


⑷ x 2 y f/ -\-xy f + = 0. (b) y tf + / tanx = 0. 

13-14. Reduce the self-adjoint DE (py f Y H- qy = 0 to m 〃 + 5(x)w = 0 by an 
appropriate change of the dependent variable. What is S(x)? Apply this reduction 
to the Legendre DE for Pnix), and show that 


S(x )= 


1 + n(n + 1) — n(n + l)x 2 
、 (1 - x 2 ) 2 


Now use this result to show that every solution of the Legendre equation has at 
least (2n + 1)/tv zeros on (—1, +1). 

13.15. Substitute v = y f /y in the homogeneous SOLDE 

y 〃 + /?W;v’ + gW;v = 0 


and: 

Riccati equation (a) Show that it turns into i/ + v 2 + p(x)v + q(x) = 0, which is a first-order 
nonlinear equation called the Riccati equation. Would the same substitution work 
if the DE were inhomogeneous? 

(b) Show that by an appropriate transformation, the Riccati equation can be directly 
cast in the form w' + m 2 + 5(;c) = 0. 

13.16. For the function S(x) defined in Example 13.6.1, let 5— 1 (:c) be the inverse, 
i_e” 5 _1 (5 (a:)) = x. Show that 


396 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS 


and given that = 0, conclude that 



13.17. Define sinh^: and cosh x as the solutions of y ff = y satisfying the boundary 
conditions : K0) = 0, y f (0) = 1 and 扒 0) = 1， ：/ ⑼ = 0, respectively. Using 
Example 13.6.1 as a guide, show that 


(a) cosh 2 x — sinh 2 x = 1. (b) cosh(—x) = cosh a:. 

(c) sinh(-x) = — sinhx. (d) sinh(a -\-x) = sinhacoshx + coshasinhx. 

13.18 •⑻ Derive Equation (13.30) of Example 13.6.4. 

(b) Derive Equation (13.31) of Example 13.6.4 by direct substitution. 

(c) Let A, = /(/ -h 1) in Example 13.6.4 and calculate the Legendre polynomials 
Pi(x) foil = 0,1,2,3, subject to the condition P/(l) = 1. 


13.19. Use Equation (13.33) of Example 13.6.5 to generate the first three Hermite 
polynomials. Use the normalization 



[H n (x)] 2 e~ xl dx = yfrzTn\ 


to determine the arbitrary constant. 

13.20. The function defined by 

~ In-X 

似 = Where Cn+2= (n + l)(n + 2) Crt> 


can be written as f(x) — cog(x)+c\h(x), where g is even and 办 is odd in• Show 

2 2 

that f(x) goes to infinity at least as fast as e x does, i.e., lim^-^oo f(x)e~ x 7^ 0. 
Hint: Consider g(x) and h(x) separately and show that 

An 一 入 

洲二巧 ， Wheie ^ 1= (2n + l)(2« + "2)^ 

n=Q 

Then concentrate on the ratio g(x)/e x2 , where g and e x2 are approximated by 

polynomials of very high degrees. Take the limit of this ratio as x ->• 00, and use 

w 2 _ • ■ 

recursion relations for g 3nde x . The odd case follows similarly. 

13.21. Refer to Example 13.6.6 for this problem. 

(a) Derive the commutation relation [a, a 卞 ] = 1. 

(b) Show that the Hamiltonian can be written as given in Equation (13.34). 

(c) Derive the commutation relation [a, (a” n ] : 


13.10 PROBLEMS 397 


(d) Take the inner product of Equation (13.36) with itself and use (c) to show that 
|c n | 2 = n\c n -i\ 2 . From this, conclude that \c n \ 2 = w!|cq| 2 . 

(e) For any function f(y) 9 show that 


(y- (e y2/2 f )= 一 〆 /2 罢 . 

A Pply (y - djdy) repeatedly to both sides of the above equation to obtain 

卜 —去) V / vx -抑" 令. 

(f) Choose an appropriate f(y) in part (e) and show that 




- 去 ) 


2 (f n 2 
(-⑽ )■ 


13.22, Solve Airy’s DE, y ff -\-xy = 0, by the power-series method. Show that the 
radius of convergence for both independent solutions is infinite. Use the compari¬ 
son theorem to show that for x >0 these solutions have infinitely many zeros, but 
for 文 < 0 they can have at most one zero. 

13.23. Show that the functions x r e Xx , where r =： 0, 1,2,..., are linearly inde¬ 
pendent. Hint: Starting with (D — 入 ) 气 apply powers of D _ 入 to a linear combination 
of x Y e^ for all possible r’s. 


13.24* Find a basis of real solutions for each DE. 


⑷ ;y" + 5/ + 6 = 0. 
( , d A y 

{c) 7? = y ' 


W 〆"+ 6/ + ny + 8j = o. 

“、 A 一 


13.25. Solve the following initial value problems. 


d^y d^y 

dx 


: y(P) = :/ ⑼ =y"(0) = 0, ;y 〃 (0) = 1. 


y(0) = /(o) = y ff (0) = o, y(0) = L 


〆())=y(0) = /(0) = 0, :/〃⑼ = 2. 


13.26. Solve = xe x subject to the initial conditions ;y ⑼ = 0, /⑼ = 


398 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS 


13.27. Find the general solution of each equation. 


⑷ = xe x . 

(c )/ + : sinx sin 2x. 

(e) y ff — y ^ e x sin2x. 

(g) y" - 4y + 4 = 〆 + xe 2 ^. 


(b) y n - 4y f 

(d) y f, -y = (\^e- x ) 2 . 

(/) / 6 )-/ 4 W, 

(h) / + y = e 2x . 


13.28. Consider the Euler equation, 

x n y^ + a n -\x n ^ l y^ n ~ l) + " • + a\xy f + aoy = r(x). 

Substitute x = e f and show that such a substitution reduces this to a DE with 
constant coefficients. In particular, solve x 2 y n — Axy f - {-6y = x. 

13.29. (a) Show that the substitution (13.50) reduces the Schrodinger equation to 
(13.51). 

(b) From the second equation of (13.51), derive the continuity equation for prob¬ 
ability. 

13*30. Show that the usual definition of probability current density, 


Re 


i im J 


reduces to that in Equation (13.52) if we use (13.50). 


13.31. Write a computer program that solves the following differential equations 
by 

⑻ Adam’s method [Equation (13.62)] and 
(b) the Runge-Kutta method [Equation (13.70)]. 

jc = t — x 2 , x( 0 )= 

x = e~~ xt , x ⑼ =1 
x = x 2 t 2 -|- 1 , x( 0 ) 

13.32. Solve the following IVPs numerically, with h = 0.1. Find the first ten 
values of x. 


1 x = f + sinx, ^(0) = 7r/2 

x = smxt, ^( 0 ) = 1 

=1 


(a) x-\- 0 . 2 i 2 + lOx = 20t, 

(b) xi-Ax^ t 2 , 

{c) X -\- X X = 0, 

(d) tx -h x + xt = 0, 

(e) x -h x + x 2 = t, 

(/) x+xt = 0 ， 

(g) x + sinx = ? 0 , 


x(0) = 0, 

增 = 

= 0. 

x(0) — 1, 

增 = 

= 0. 

x(0) = 2, 

i(0) = 

= 0. 

x(0) = 1, 


= 0. 

x(0) = 1, 

i ： (0) = 0. 

jc(0) = 0, 

増 = 

= 1. 

雄 )= .， 

対 0) 

= 0. 







13.10 PROBLEMS 399 


13.33. Substitute djdt = D — —(l//z)ln(1 — V) in the SOLDE x + p(t)x ■+■ 
q(t)x = r(t) and expand the log terms to obtain 

(V 2 + V 3 )a:„ - hp n (y + |V 2 )x n + h 2 q n x n = h 2 r n 

Since V is of order h, one has to keep one power fewer in the second term. Find 
an expression for x n in terms of x n -\ 9 x n - 2 , and x n s 9 valid to h 2 . 

Additional Reading 

1. Birkhoff, G. and Rota, G.-C. Ordinary Differential Equations, 3rd ed., Wi¬ 
ley, 1978. The small size of this book is very deceptive. It is loaded with 
information. Written by two excellent mathematicians and authors, the book 
covers all the topics of this chapter and much more in a very clear and lucid 
style. 

2. DeVries, P. A First Course in Computational Physics, Wiley, 1994. The 
numerical solutions of differential equations are discussed in detail. The 
approach is slightly different from the one used in this chapter. 

3. Hildebrand, F. Introduction to Numerical Analysis, 2nd ed. ， Dover, 1987. Our 
treatment of numerical solutions of differential equations closely follows that 
of this reference. 

4. Mathews, J. and Walker, R. Mathematical Methods of Physics, 2nd ed., 
Benjamin, 1970. A good source for WKB approximation* 


14 __ _ 

Complex Analysis of SOLDEs 


We have familiarized ourselves with some useful techniques for finding solutions 
to differential equations. One powerful method that leads to formal solutions is 
power series. We also stated Theorem 13.6.7 which guarantees the convergence 
of the solution of the power series within a circle whose size is at least as large 
as the smallest of the circles of convergence of the coefficient functions. Thus, 
the convergence of the solution is related to the convergence of the coefficient 
functions. What about the nature of the convergence, or the analyticity of the 
solution? Is it related to the analyticity of the coefficient functions? If so, how? 
Are the singular points of the coefficients also singular points of the solution? Is the 
nature of the singularities the same? This chapter answers some of these questions. 

Analyticity is best handled in the complex plane. An important reason for this 
is the property of analytic continuation discussed in Chapter 11. The differential 
equation du/dx = u 2 has a solution u = —l/x for all x except x = 0. Thus, we 
have to “puncture” the real line by removing x = 0 from it. Then we have two 
solutions, because the domain of definition of w = — i/x is not connected on the 
real line (technically, the definition of a function includes its domain as well as the 
rule for going from the domain to the range). In addition, if we confine ourselves 
to the real line, there is no way that we can connect the x >0 region to the x <0 
region. However, in the complex plane the same equation, dw/dz — w 2 , has 
the complex solution w = —l/z, which is analytic everywhere except at z = 0. 
Puncturing the complex plane does not destroy the connectivity of the region of 
definition of w. Thus, the solution in the jc > 0 region can be analytically continued 
to the solution in the x < 0 region by going around the origin. 

The aim of this chapter is to investigate the analytic properties of the solutions 
of some well known SOLDEs in mathematical physics. We begin with a result 
from differential equation theory (fora proof, see [Birk 78, p. 223]). 







14.1 ANALYTIC PROPERTIES OF COMPLEX DES 401 


continuation 14.0.1. Proposition, (continuation principle) The function obtained by analytic 
principle continuation of any solution of an analytic differential equation along any path 
in the complex plane is a solution of the analytic continuation of the differential 
equation along the same path. 

An analytic differential equation is one with analytic coefficient functions. 
This proposition makes it possible to find a solution in one region of the complex 
plane and then continue it analytically. The following example shows how the 
singularities of the coefficient functions affect the behavior of the solution. 

14.0.2, Example. Let us consider the FODE w f — (y/z)w = 0 for y € M. The coefficient 
function p(z) = —y/z has a simple pole at z = 0. The solution to the FODE is easily found 
to be w = z^. Thus, depending on whether y is a normegative integer, a negative integer 
—m, or a noninteger, the solution has a regular point, a pole of order w, or a branch point 
at z = 0, respectively. M 

This example shows that the singularities of the solution depend on the param¬ 
eters of the differential equation. 


14.1 Analytic Properties of Complex DEs 

To prepare for discussing the analytic properties of the solutions of SOLDEs, 
let us consider some general properties of differential equations from a complex 
analytical point of view. 


14.1.1 Complex FOLDEs 

In the homogeneous FOLDE 
dw 


dz 


+ p(z)w = 0, 


(14.1) 


p(z) is assumed to have only isolated singular points. It follows that p(z) can be 
expanded about a point zo — which may be a singularity of p(z) — as a Laurent 
series in some annular region r\ <\z — zo\ < 

OQ 

p(z) ― ^ Ci n (z- zo) n where r\ < \z-zo\ < r 2 . 

n=—oo 

The solution to Equation (14.1), as given in Theorem 13.2.1 with ^ = 0, is 


w(z) = exp |^- j p(z) dz^ 


C exp 


Cexp 


f (乙 - zo) n dz -^2 a ~ n f ( z - ^o)~ n dz 
n^O J 』 


f dz 

■fl-l/ - 

J z- zo 

a -1 ln(z -zo)-^2 ^T( z 一 Z0 ) rt+1 一 



402 14. COMPLEX ANALYSIS OF SOLOES 


We can write this solution as 


w(z) = C(z- zo) a g(z), (14.2) 

where a = —a_i and g(z) is an analytic single-valued function in the annular 
region r\ < \z — zo\ < because g(z) is the exponential of an analytic function. 

For the special case in which p has a simple pole, i.e.，when a- n = 0 for all 
w > 2, the second sum in the exponent will be absent, and g will be analytic even 
at zo- In fact, g(zo) = 1, and choosing C = 1, we can write 


w(z) — (z- zo) a 


OO 



(14.3) 


The singularity of the 
coefficient functions 
of an FOLDE 
determines the 
singularity of the 
solution. 


Depending on the nature of the singularity of p{z) at zo, the solutions given by 
Equation (14.2) have different classifications. For instance, if p(z) has a removable 
singularity (if a_ n = 0 V n > 1)，the solution is Cg(z), which is analytic. In this 
case, we say that the FOLDE [Equation (14.1)] has a removable singularity at zq. 
If p(z) has a simple pole at 如 （if _ 0 and a_ n = 0 V « > 2), then in general, 
the solution has a branch point at zq. In this case we say that the FOLDE has a 
regular singular point. Finally, if p(z) has a pole of order m > l, then the solution 
will have an essential singularity (see Problem 14.1). In this case the FOLDE is 
said to have an irregular singular point. 

To arrive at the solution given by Equation (14.2), we had to solve the FOLDE. 
Since higher-order differential equations are not as easily solved, it is desirable to 
obtain such a solution through other considerations. The following example sets 
the stage for this endeavor. 


14.1.1. Example. A FOLDE has a unique solution, to within a multiplicative constant, 
given by Theorem 13.2.1, Thus, given a solution w(z), any other solution must be of the 
form Cw(z). Let zo be a singularity of p(z )， and let z — zo = re ie . Start at a point z and 
circle zo so that 0 0 + 2tc. Even though p(z) may have a simple pole at zo, the solution 

may have a branch point there. This is clear from the general solution, where a may be 
a noninteger. Thus, w(z) = + re l ( 0 + 2?r )) may be different from w(z). To discover 

this branch point~without solving the DE 一 invoke Proposition 14.0.1 and conclude that 
w(z) is also a solution to the FOLDE. Thus, w(z) can be different from w(z) by at most 
a multiplicative constant: w(z) = Cw(z). Define the complex number a by C == e 2 ^ 101 . 
Then the function g(z) = (z — is single-valued around zo. h fact, 


g(Z0+ d (奸如） =[r，+ 2 冗)] 一% ㈨ + 以糾加 )） 

= (z - zo)~ a e~ 27Tlol e 2nia w(z) -(.z- zo)~ a w(z) = g(z)^ 


This argument shows that a solution w(z) of the FOLDE of Equation (14*1) can be 
written as w(z) = (z — zo) a g(z), where g(z) is single-valued. 团 




14.1 ANALYTIC PROPERTIES OF COMPLEX DES 403 


circuit matrix 


14.1.2 The Circuit Matrix 


The method used in Example 14.1.1 can be generalized to obtain a similar result 
for the NOLDE 


r i d n w f ^d n ~ l w / dw 

L M = ^ +p«_i(z) + • ■ • + /?i(z)— + po(z)w = 0 (14.4) 

where all the pi(z) are analytic inn < \z — zol < ^"2- 

Let {wj (z)}y =1 be a basis of solutions of Equation (14.4), and let z—zo — re i0 . 
Start at z and analytically continue the functions Wj (z) one complete turn to 汐 +2 兀 . 
Let u)j(z) = Wj (zo 4- re l9 ) = wj (zq + r〆 印 + 2 幻 ). Then, by a generalization of 


Proposition 14.0.1, {u)j(z)}" =1 are not only solutions, but they are linearly inde¬ 
pendent (because they are Wj’s evaluated at a different point). Therefore, they also 


forma basis of solutions. On the other hand, Wj (^) can be expressed as a linear com¬ 
bination of the % (z).Thus, u)j(z) — wj (zo + re 1 ^ 2 ^) — J^k=\ ^jkm(z)-The 
matrix A = (ajk), called the circuit matrix of the NOLDE, is invertible, because 
it transforms one basis into another. Therefore, it has only nonzero eigenvalues. 


We let A be one such eigenvalue, and choose the column vector C, with entries 


{c/}f =1 , to be the corresponding eigenvector of the transpose of A (note that A 
and A、have the same set of eigenvalues). At least one such eigenvector always 
exists, because the characteristic polynomial of A 1 has at least one root. Now we 
let w(z) = Clearly, this w{z) is a solution of (14.4), and 


w(z) = w(zo + 厂 〆 料 2jr) ) = y^CjWj(zo + re l(l9+2jr) ) 

j=i 

n n n 

= Y1 C J Yl a J kWk ^ = 工 S At )kjCjW k (Z) = ^^CkW k {z) = kw(z). 

j=l *=1 j，k k=l 

If we define a by 入 = e 2nia y then u;(zo + re i(0+2jr )) = e 2nia w(z)- Now we write 
f(z) ^ (z — zo)~ a w(z). Following the argument used in Example 14.1.1, we get 
f(zo + re^ 2 ^) = : f(z )； that is, f{z) is single-valued around zo- We thus have 
the following theorem. 

14.1.2. Theorem* Any homogeneous NOLDE with analytic coefficient functions 
inr\ < \z ~ zo\ < ^2 admits a solution of the form 

w(z) = (z- zo) a f(z) 

where f(z) is single-valued around zo ^ < k — < 厂 2 . 

An isolated singular point zo near which an analytic function w(z) can be 
written as w(z) = (z - zo) a f(z), where f(z) is single-valued and analytic in 
the punctured neighborhood of zo, is called a simple branch point of w(z). The 
arguments leading to Theorem 14.1.2 imply that a solution with a simple branch 


404 14. COMPLEX ANALYSIS OF SOLDES 


canonical basis of the 
SOLDE 


point exists if and only if the vector C whose components appear in is an 
eigenvector of A， ， the transpose of the circuit matrix. Thus, there are as many 
solutions with simple branch points as there are linearly independent eigenvectors 

of A f . 


14.2 Complex SOLDEs 

Let us now consider the SOLDE w tf + p(z)w f + q(z)w = 0. Given two linearly 
independent solutions u ； x(z) and 奶 ⑸， we form the 2x2 matrix A and try to 
diagonalize it. There are three possible outcomes: 

1. The matrix A is diagonalizable, and we can find two eigenvectors, F(z) 
and G(z), corresponding, respectively, to two distinct eigenvalues, and 
入 2 . This means that F(zo + re^ 27t) ) = XiF(z) and G(zo + re l ^ ln) )= 
； i 2 G(2). Defining 入 i = e 2nia mdX 2 = e 2jr, ^,wegetF(z) = (z-zo) a f(z) 
andGfe) = (z-zo) p g(z), as Theorem 14.1.2 suggests. The set {F(^), G(z)} 
is called a canonical basis of the SOLDE. 

2. The matrix A is diagonalizable, and the two eigenvalues are the same. In this 
case both F(z) and G(z) have the same constant a: 

F(z) ^(z- zo) a f(z) and G(z) = (z- zo) a g(z). 

3. We cannot find two eigenvectors. This corresponds to the case where A 
is not diagonalizable. However, we can always find one eigenvector, so 
A has only one eigenvalue, X. We let wi(z) be the solution of the form 
( 之 — zo) a f(z), where f(z) is single-valued and k = e^ n ' l0L . The existence 
of such a solution is guaranteed by Theorem 14.1.2. Let be any other 
linearly independent solution (Theorem 13.3.5 ensures the existence of such 
a second solution). Then 

W 2 {zq + re* (0+2jr) ) = aw\{z) + ⑽ 2 ⑵， 

and the circuit matrix will be A = (^: which has eigenvalues k and b. 

Since A is assumed to have only one eigenvalue (otherwise we would have 
the first outcome again), we must have b = k. This reduces A to A = (^)» 
where a _ 0. The condition a ^Ois necessary to distinguish this case from 
the second outcome. Now we analytically continue h{z) = W 2 (z)/w\(z) 
one whole turn around zo, obtaining 


h(zo + re i ^ 2jt) ) = 


w 2 (zo + re i{0+2n) ) 
w\ (zo +re“ 0 + 2?r )) 


aw\(z) 4 - Xw 2 (z) 
一 A,u^i(z) 


a w 2 (z) 

入 


a 

X 





14.2 COMPLEX SOLDES 405 


regular singular point 
of a SOLDE defined 


It then follows that the function 1 
8l(z) = Hz) - 2^ ln(2 - Z0) 

is single-valued inn < \z - zq\ < ， 2. If we redefine g\(z) and W 2 (z) as 
(27H’ 入 /a)gi(z) and (2?d 入 /a)u ； 2(z) ， respectively, we have the following: 

14.2.1. Theorem* If p(z) and q(z) are analytic in the annular region r\ < \z — 
zo\ < ^ 2 , then the SOLDE w ,f + p(z)w f + q(z)w = 0 admits a basis of solutions 
{w\, W 2 } in the neighborhood of the singular point zo, where either 

m(z) = {z - zo) a f(z), w 2 (z) = (z- zo)^g(z) 
or, in exceptional cases (when the circuit matrix is not diagonalizable) t 

m(z) = (z- zo) a f(z) 9 w 2 (z) = wi(z)[gi(z) -\-Mz - zo)l 

The functions f(z), g(z), and g\(z) are analytic and single-valued in the annular 
region. 

This theorem allows us to factor out the branch point zq from the rest of the 
solutions. However, even though f(z), g(z), and 茗 i( 之 ） are analytic in the annular 
region r\ < \z — zo\ < ^2? they may very well have poles of arbitrary orders at zo. 
Can we also factor out the poles? In general, we cannot; however, under special 
circumstances, described in the following definition, we can. 

14.2.2. Definition, A SOLDE of the form w ff -\~p{z)w f -\-q{z)w = Q that is analytic 
inO < \z~zq\ < r has a regular singular point at zq if p (z) has at worst a simple 
pole and q(z) has at worst a pole of order 2 there. 

In a neighborhood of a regular singular point zo, the coefficient functions p(z) 
and^(z) have the power-series expansions 

00 

P(z) = — + ^a k (z- zo) k , 

z -妨匕 

Q(z) = b ~ 2 2 + - — zo)*. 

U - zor z-zo ^ 

Multiplying both sides of the first equation by z — zo and the second by (z — zo) 2 
and introducing P(z) ^(z- zo)p(z), Q{z) = (z- zo) 2 q(z), we obtain 

OO OQ 

p(z) = — zo) 夂 q(z) = — zo) k - 

k=Q k=0 


1 Recall that ln(z — zo) increases by 2ni for each turn around zo- 


406 14. COMPLEX ANALYSIS OFSOLDES 


indicial equation, 
indicial polynomial, 
characteristic 
exponents 


It is also convenient to multiply the SOLDE by (z — zo) 2 write it as 
(z- zo) 2 w ff H-(z - zo)P(z)w f + Q(z)w = 0. 

Inspired by the discussion leading to Theorem 14.2.1, we write 


(14.5) 


oo 


U)(z) = (z — Zo) V Clc(z — zo) k , 


Co — 1 , 


灸 =0 


(14.6) 


where we have chosen the arbitrary multiplicative constant in such a way that 
Co = 1. Substitute this in Equation (14.5), and change the dummy variable — so 
that all sums start at 0 — to obtain 


00 

E 

R=0 


(n + v)(n + v - l)C n + y^[(fc + v)a n -fc-i + b n _k-2]Ck 

灸 =0 

- (z - zo) n ^ v — 0 , 


which results in the recursion relation 


n 

(n + v)(n + v - 1)C„ = - + v)a n -k-\ + b n -k~ 2 ]Ck- (14.7) 

k=o - 

For n = 0, this leads to what is known as the indicial equation for the exponent 
v: 


I(v) = v(v — 1) + a-\v + b^2 = 0- (14.8) 

The roots of this equation are called the characteristic exponents of zo, and I (v) is 
called its indicial polynomial. In tenns of this polynomial, (14.7) can be expressed 
as 

«—l 

I(n + v)C n = — y^[(fc + v)a n -k-\ + b n -k- 2 ]Ck for « = 1,2, — 

k=o (14.9) 

Equation (14.8) determines what values of v are possible, and Equation (14.9) 
gives Ci, C2, which in turn determine w(z). Special care must be taken 

if the indicial polynomial vanishes at« + v for some positive integer n, that is, if 
n + v，in addition to v, is a root of the indicial polynomial: /(n + v) = 0 = /(v). 

If vi and V2 are characteristic exponents of the indicial equation and Re(vx) > 
Re(V2) ， then a solution for vi always exists. A solution forv2 also exists if V1-V2 ^ 
n for any (positive) integer n. In particular，if zo is an ordinary point [a point at 
which both p(z) and q(z) are analytic], then only one solution is determined by 
(14.9). (Why?) The foregoing discussion is summarized in the following: 



14.2 COMPLEX SOLOES 407 


14.2.3. Theorem. If the differential equation w ff H- p{z)vi) f + q(z)w = 0 has a 
regular singular point at z = zo, then at least one power series of the form of 
(14.6) formally solves the equation. If V\ and V 2 are the characteristic exponents 
of zo, then there are two linearly independent formal solutions unless v\ — V 2 is 
an integer ， 


14.2.4. Example* Let us consider some familiar differential equations. 

(a) The Bessel equation is 

+ + ^1 - ^ 

In this case，the origin is a regular singular point, a_i = 1, and b _2 = —a 2 . Thus, the 
indicial equation is v(y — 1) + v — a 2 ^ 0, and its solutions are vj = a and V 2 = 
—a. Therefore, there are two linearly independent solutions to the Bessel equation unless 
vi — U 2 = 2or is an integer, i.e., unless a is either an integer or a half-integer. 

(b) For the Coulomb potential f(r) = 卢 /r, the most general radial equation [Equation 
(12.14)] reduces to 

«/' + 譬„/+(營一含) w = 0 . 



The point z = 0 is a regular singular point at which a^\ = 2 and 办 一 2 = —a. The indicial 
polynomial is I(v) — v 2 +v — or with characteristic exponents v\ — + +4a and 

V 2 = —士一 + 4a. There are two independent solutions unless v\ —V 2 = Vl +4a 
is an integer. In practice, a, = 1(1 + 1), where l is some integer; so v\ — V 2 = 2/ + 1, and 
only one solution is obtained. 

(c) The hypergeometric differential equation is 


w 


〃 



y - (a + p -\- l)z 
z(l - z) 



ap 

z(l - z) 



A substantial number of functions in mathematical physics are solutions of this remarkable 
equation, with appropriate values for a ， 0， and y. The regular singular points 2 are z ; 0 and 
z — 1. Atz = 0, a-\ = y and — 0. The indicial polynomial is /(v) = v(v + y — 1 )， 
whose roots are Vi =0 and V 2 = 1 — K* Unless y is an integer, we have two formal 
solutions. H 


It is shown in differential equation theory [Birk 78, pp. 40-242] that as long 
as v\ — V 2 is not an integer, the series solution of Theorem 14.2.3 is convergent 
for a neighborhood of zo. What happens when vi — V2 is an integer? First, as a 
convenience, we translate the coordinate axes so that the point zo coincides with 
the origin. This will save us some writing, because instead of powers of z — zo, 
we will have powers of z. Next we let v\ — V 2 ~\~n with n a positive integer. Then, 
since it is impossible to encounter any new zero of the indicial polynomial beyond 


2 The coefficient of w need not have a pole of order 2. Its pole can be of order one as well. 




408 14. COMPLEX ANALYSIS OFSOLDES 


Vi, the recursion relation, Equation (14.9), will be valid for all values of n，and we 
obtain a solution: 

^i(z) = z Vl f(z) = z Vi 11 + CkZ k 

V ^=i 

which is convergent in the region 0 < [z| < r for some r > 0 To investigate the 
nature and the possibility of the second solution, write the recursion relations of 
Equation (14.9) for the smaller characteristic root V 2 ： 

=PlI(V2+D 

I(v2 + l)Ci = -(v2ao + b-i)Co C\= p\, 

I(y2 + 2)C2 = 一 (V2fll + bo)Co — [(V2 + 1)^0 + ^-l]Cl C 2 = P2i 

: (14.10) 

I(V 2 +n- 1)C„_1 = Pn-lHV 2 + « 一 1)^0 => C n -1 = p n -i, 

/(V2 +«)C„ = I{v\)C n = p n Co => 0 = p n , 

where in each step, we have used the result of the previous step in which is 
given as a multiple of Co = 1. Here, the p’s are constants depending (possibly in 
a very complicated way) on the a^s and 办女 ’s. 

Theorem 14.2.3 guarantees two power series solutions only when v\ — v% is 
notan integer. When v\ — V 2 is an integer. Equation (14.10) shows that a necessary 
condition for a second power series solution to exist is that p n = 0. Therefore, 
when p n 0, we have to resort to other means of obtaining the second solution. 
Let us define the second solution as 

邮 i(0 

上 

1 ^ 2 ( 2 ) = m (z)h(z) = z n f(z)Hz) (14.11) 

and substitute in the SOLDE to obtain a FOLDE in h\ namely, h /f + (p + 
2w[/wi)h f = 0, or, by substituting Wj/wi = v\/z-\- f f /f, the equivalent FOLDE 

h f, + + 亨 + /?) Y = 0. (14.12) 

14.2.5. Lemma. The coefficient ofh f in Equation (14,12) has a residue ofn + 1. 

Proof. Recall that the residue of a function is the coefficient of z^ 1 in the Laurent 
expansion of the function (about z = 0). Let us denote this residue for the coef¬ 
ficient of h! by A-i ，Since /(0) = 1, the ratio /’//is analytic at ^ = 0. Thus, 
the simple pole atz = 0 comes from the other two terms. Substituting the Laurent 
expansion of p(z) gives 

2v\ 2v\ a-\ ' 

- h P = - 1 - h <30 + a lZ + • ， .. 



14.2 COMPLEX SOLDES 409 


This shows that A_i = 2v\ 4 - a—i • On the other hand，comparing the two versions 
of the indicial polynomial v 2 + (a_i - l)v +i_2 and (v - vi)(v - V2) = v 2 - (y\ + 
V 2 )v 4 - v\V 2 gives V\+V 2 = -(a_i — 1 )， or 2v\ - n = -(a-i - 1). Therefore, 
A-\ = 2v\ + a~ i = n + 1. □ 


14.2.6. Theorem. Suppose that the characteristic exponents of a SOLDE with a 
regular singular point at z = 0 are vi and V 2 . Consider three cases: 

1. v\ — V 2 is not an integer. 

2. V 2 = v\—n where n is a nonnegative integer，and p nj as defined in Equation 

(14.10) ， vanishes. 

3. V 2 = vi —n where n is a nonnegative integer, and p nf as defined in Equation 

(14.10) ，does not vanish. 


Then, in the first two cases，there exists a basis of solutions 1^2} of the form 


OO 


wi( Z ) = z Vi y. + zk J » ， = 1 ， 2 , 

and in the third case, the basis of solutions takes the form 


OO 


00 


= z Vl 

k=l 


m(z) = z Vl 1 + ^ bkZ k ) +Cu ； i(z)lnz, 


where the power series are convergent in a neighborhood ofz — 0. 


Proof. The first two cases have been shown before. For the third case, we use 
Lemma 14.2.5 and write 


z 



+ P = 


n + l 

z 


OO 


k=0 



and the solution for the FOLDE in h f will be [see Equation (14.3) and the discussion 
preceding it] 


/ oo 

h\z)=z- n - x [\+Y, b ^ k 

\ k=\ 

For « = 0, i.e. s when the indicial polynomial has a double root, this yields h f (z) = 
1A+5ZS=1 Hz k ~ l ,oih(z) = In zH-gi(z), where is analytic in a neighborhood 
of z — 0, For k # 0, we have h f (z) = b n jz + ^kz k ~" n ~ l and, by integration, 



OO 

h(z) =b n lnz-\-^2 

k^n 


bk Jc_n 

k-n 


= b n \nz-\-z~ n ^ 

k 和 i 


b k k 
k — n 


= b n ]nz-\-z~ n g 2 (z). 



410 14. COMPLEX ANALYSIS OF SOLDES 


where g 2 is analytic in a neighborhood ofz = 0. Substituting/i in Equation (14.11) 
and recalling that V 2 — vi — n, we obtain the desired results of the theorem. □ 


14.3 Fuchsian Differential Equations 

In many cases of physical interest, the behavior of the solution of a SOLDE at 
infinity is important. For instance, bound state solutions of the Schrodinger equa¬ 
tion describing the probability amplitudes of particles in quantum mechanics must 
tend to zero as the distance from the center of the binding force increases. 

We have seen that the behavior of a solution is determined by the behavior 
of the coefficient functions. To determine the behavior at infinity, we substitute 
^ = 1/r in the SOLDE 

+ — 0 (14.13) 

dz l dz 

and obtain 

where v{t) = w(1/0j r(t) = and s(t) = q(l/t). 

Clearly, as z ^ oo, / ^ 0. Thus, we are interested in the behavior of (14.14) 
att = 0. We assume that both r(0 and are analytic aU = 0. Equation (14.14) 
shows, however, that the solution v(t) may still have singularities att = 0 because 
of the extra terms appearing in the coefficient functions. 

We assume that infinity is a regular singular point of (14.13)，by which we 
mean that / = 0 is a regular singular point of (14.14). Therefore, in the Taylor 
expansions ofr(r) and 5(f), the fiist (constant) term of r(t) and the first two terms 
of ^(0 must be zero. Thus, we write 

00 

r(t) = a\t H- a 2 t 2 + ••• = [ akt k , 

k=\ 

OO 

s{t) = b 2 t 2 H- + . • • = bkt k . 

k=2 

By their definitions, these two equations imply that for p(z) and 分 (z), and for large 
values of |z|，we must have expressions of the form 



(14.15) 



14.3 FUCHSIAN DIFFERENTIAL EQUATIONS 411 


Fuchsian DE 


A second-order 
Fuchsian DE with two 
regular singular 
points leads to 
uninteresting 
solutions! 


A second-order 
Fuchsian DE with 
three regular singular 
points leads to 
interesting solutions! 


When infinity is a regular singular point of Equation (14.13) ， or ， equiva¬ 
lently, when the origin is a regular singular point of (14.14), it follows from 
Theorem 14.2.6 that there exists at least one solution of the form ui(/)= 
t a (1 + YlkLi ^kt k ) or, in terms of z, 

+ (14.16) 

Here a is a characteristic exponents aU = 0 of (14.14), whose indicial polynomial 
is easily found to be a (a 一 1) + (2 — a\)a + 办 2 = 0 . 


14«3,1. Definition. A homogeneous differential equation with single-valued an¬ 
alytic coefficient functions is called a Fuchsian differential equation (FDE) if it 
has only regular singular points in the extended complex plane, i.e” the complex 
plane including the point at infinity. 

It turns out that a particular kind of FDE describes a large class of nonelemen¬ 
tary functions encountered in mathematical physics. Therefore, it is instructive to 
classify various kinds of FDEs. A fact that is used in such a classification is that 
complex functions whose only singularities in the extended complex plane are 
poles are rational functions, i.e., ratios of polynomials (see Example 10.2.2). We 
thus expect FDEs to have only rational functions as coefficients. 

Consider the case where the equation has at most two regular singular points at 
♦ z — Z\ 

z\ andz 2 - We introduce a new variable —- . The regular singular points 

z _ Z2 

at z\ and Z2 are mapped onto the points = ^(z\) = 0 and §2 = = 00 , 

respectively, in the extended ^-plane. Equation (14.13) becomes 

d^u du 

T7T + + - 0, (14.17) 

where w ，伞 ， and © are functions of ^ obtained when z is expressed in terms of 
$ in w(z), p(z), and q(z), respectively. From Equation (14.15) and the fact that 
§ = 0 is at most a simple pole of 0( 夸 )， we obtain <!>(§) = a\j%. Similarly, 
0( 专 ） = / - Thus, a SOFDE with two regular singular points is equivalent 
to the DE u/’ + (ai/z)u/ + (b 2 /z 2 )w = 0. Multiplying both sides by z 2 , we 
obtain z 2 w f, H- a\zw f b 2 U) = 0, which is the second-order Euler differential 
equation. A general nth-order Euler differential equation is equivalent to a NOLDE 
with constant coefficients (see Problem 13.28). Thus, a second order Fuchsian DE 
(SOFDE) with two regular singular points is equivalent to a SOLDE with constant 
coefficients and produces nothing new. 

The simplest SOFDE whose solutions may include nonelementary functions 
is therefore one having three regular singular points, at say z\, Z2> andz 3 . By the 
transformation 


?(z) = 


(Z~Zl)(Z3 -Z2) 
(Z - Z2)(Z3 -Zl) 


412 14. COMPLEX ANALYSIS OF SOLDES 


Riemann differential 
equation 


we can map zu zi, and 23 onto = 0, §2 = ㈤， and ^3 ^ 1- Thus, we assume 
that the three regular singular points are at z = 0 , 之 = 1 ， and z — oo.lt can be 
shown [see Problem (14.8)] that the most general p(z) and q(z) are 

A\ B\ , A 2 , B 2 M 

and q(z) = 


P(z) 


z 


z 


1 — … ^ (z - 1) 2 z(z-iy 

We thus have the following theorem. 

14.3.2. Theorem. The most general second order Fuchsian DE with three regular 
singular points can be transformed into the form 


u;〃 + 



- A2 B2 

A3 I 


-Z 2 + (2 - l) 2 

z(z- 1)-1 


w 


0 , 


(14.18) 


where A\ f A 2 /A 3 , and Bi are constants. This equation is called the Riemann 
differential equation. 

We can write the Riemann DE in terms of pairs of characteristic exponents, 
(^ 1 , k 2 ) 9 Oi ， M 2 )，and (vu v 2 ), belonging to the singular points 0 , 1 ， and 00 , 
respectively. The indicial equations are easily found to be 

入 2 + (A 广 1 ) 入 + A 2 = 0 9 

fx^ + — l)/x + B 2 = 0 , 

v 2 + (1 — Ai - B\)v + A 2 + ^2 — A 3 = 0. 

By writing the indicial equations as (k — 人 i)( 入一入 2 ) = 0, and so forth and 
comparing coefficients, we can find the following relations: 


Ai = 1 _ 入 1 — 入 2 ， 

B\ = 1 — — [12, 

A\ + B\ = Vi + V 2 + 1 ， 


A 2 = 入 1 入 2 ， 

Bi — MlM 2 , 

A2 + B2 — 4 A3 = V1V2. 


These equations lead easily to the Riemann identity 


入 1 + 久 2 + W + M 2 + Vl + = 1. (14.19) 

Substituting these results in (14,18) gives the following result. 

14.3.3. Theorem. A second order Fuchsian DE with three regular singular points 
in the extended complex plane is equivalent to the Riemann DE ， 


w 


1 一入 1 一入 2 1 — 一 fJL2 


z 


z 


1 


w 


入 1 入 2 . M 1 M 2 , V 1 V 2 — — /X1/X2 


- z 2 (z - l) 2 


z{z - 1) 


(14.20) 


w = 0 , 


which is uniquely determined by the pairs of characteristic exponents at each 
singular point. The characteristic exponents satisfy the Riemann identity, Equation 
(14.19). 


14.4 THE HYPERGEOMETRIC FUNCTION 413 


hypergeometric DE 


hypergeometric 

function 


The uniqueness of the Riemann DE allows us to derive identities for solutions 
and reduce the independent parameters of Equation (14.20) from five to three. We 
first note that if w(z) is a solution of the Riemann DE corresponding to ( 入 ！ ，入 2 )， 
(Mi ， M 2 )，and (vi, V 2 )，then the function 

v(z) = z k {z - l^wiz) 

has branch points at z = 0,1， oo [because w(z) does]; therefore, it is a solution of 
the Riemann DE. Its pairs of characteristic exponents are (see Problem 14.10) 

(入 1 + 入，入 2 + 入)， （从1 +从，弘2 +弘)， （vi —人 一 a V2 —人 一/ X ). 

In particular, if we let k = and jjl = —/xi ， then the pairs reduce to 
(0, 入 2 —人 1)， (0 ,112 —从1)， （vi + 入1 + 从1， V 2 + 入1 + Mi ). 

Defining a = vi -f Xi + = V 2 + A-i + jxi, and y s 1 — 入 2 + 人 1， and using 

(14.19)，we can write the pairs as 

(0,1 - y), (0,y -a-^), (a 肩， 

which yield the third version of the Riemann DE 

H- ( — I --- ) w H — - —== 0. 

This important equation is commonly written in the equivalent form 

z(i — z)w ,f + [y — (1 + or + p)z\w f — afiw = 0 (14.21) 

and is called the hypergeometric differential equation (HGDE). We will study 
this equation next. 


14.4 The Hypergeometric Function 


The two characteristic exponents of Equation (14.21) at 2 = 0 are 0 and 1 — 

It follows from Theorem 14.2.6 that there exists an analytic solution (correspond¬ 
ing to the characteristic exponent 0) at z = 0. Let us denote this solution, the 
hypergeometric function, by F(a, y; z) and write 

OO 

F(a, )6; y; z) = ^ akZ k where 即 =1_ 
k=0 

Substituting in the DE, we obtain the recurrence relation 


(a + fc)(j6 + k) 
(*+ l)(j/ 4-fe ) ak 


for k >0. 


414 14, COMPLEX ANALYSIS OF SOLDES 


hypergeometric 

series 


These coefficients can be determined successively if y is neither zero nor a negative 
integer: 


F(a ， fi; y; z)= 


00 


i+E 

k=l 


<y(a + 1) ■ ■ ■ (Of + A: - 1 )扒卢 + 1).. • Q8 + /: - 1) k 
k\y(y + l)-*(y + A:- 1) 


r(y) f r^ + ^rpg + fc) k 

r(«)r(«^r(fe + i)r(y + fc) z 


(14.22) 


The series in (14.22) is called the hypergeometric series ， because it is the gener¬ 
alization of 尸 （ 1 ， 卢； z)，which is simply the geometric series. 

We note immediately from (14.22) that 


14A1* Box* The hypergeometric series becomes a polynomial if either a 
or p is a negative integer. 


This is because for k <\a\ (ork < |^|)bothr(a + A：) [orT(/3+A：)] and V(a) 
[or r ( 卢 ) ]have poles that cancel each other. However, r(a + k) [or r(# + fc)] 
becomes finite for k > \a\ (ork > |^|), and the pole in V(a) [or r ( 卢 ) ]makes the 
denominator infinite. Therefore, all terms of the series (14.22) beyond k = |a| (or 
k = \p\) will be zero. 

Many of the properties of the hypergeometric function can be obtained directly 
from the HGDE, Equation (14.21). For instance, differentiating the HGDE and 
letting v = w\ we obtain 

z{\- z)v ff + + 1 — （ a + 卢 + 3)zW ~(cc + 1)( 卢 + l)v = 0, 

which shows that F f (a, y; z) = CF(a + 1 ，办 + 1; y + 1; z). The constant C 
can be determined by differentiating Equation (14.22)，setting z = 0 in the result, 3 
and noting that F(a + 1， 0 + 1; y + 1; 0) ：= 1. Then we obtain 

F\a^\y^z) = —F(a + 1 ，卢 + 1; y + 1; z). (14.23) 

Y 

Now assume that y 一 1， and make the substitution w = z x ~ Y u in the HGDE 
toobtain 4 z(l—z)w"+[}/i —(Qfi+pi + lKlM 7 — ai 卢 im = 0, where ai = 

= P — y + l, and y\ =2 — y. Thus, 

u = F(a - + 1, ^ - y + 1; 2 - z), 


^Note that the hypergeometric function evaluates to 1 at z = 0 regardless of its parameters. 

4 In the following discussion, and 竹 will represent the parameters of the new DE satisfied by the new function defined 

in terms of the old. 



14.4 THE HYPERGEOMETRIC FUNCTION 415 


and u is therefore analytic atz = 0. This leads to an interesting result. Provided 
that y is not an integer, the two functions 

wi(z) = F{a,P\ y; z), W 2 (z) = z l ~ y F(a 一 y + 1 ， 办 一 y + 1; 2 - y; z) 

(14.24) 

form a canonical basis of solutions to the HGDE at 之 = 0. This follows from 
Theorem 14.2.6 and the fact that (0,1 — y) are a pair of (different) characteristic 
exponents at z = 0. 


Johann Carl Friedrich Gauss (1777-1855) was the greatest of 
all mathematicians and perhaps the most richly gifted genius of 
whom there is any record. He was bom in the city of Brunswick 
in northern Germany. His exceptional skill with numbers was 
clear at a veiy early age, and in later life he joked that he knew 
how to count before he could talk. It is said that Goethe wrote 
and directed little plays for a puppet theater when he was 6 and 
that Mozart composed his first childish minuets when he was 
5, but Gauss corrected an error in his father’s payroll accounts 
at the age of 3. At the age of seven, when he started elemen¬ 
tary school, his teacher was amazed when Gauss summed the 
integers from 1 to 100 instantly by spotting that the sum was 
50 pairs of numbers each pair summing to 101. 

His long professional life is so filled with accomplishments that it is impossible to give 
a full account of them in the short space available here. All we can do is simply give a 
chronology of his almost uncountable discoveries. 

1792-1794: Gauss reads the works of Newton, Euler, and Lagrange; discovers the prime 
number theorem (at the age of 14 or 15); invents the method of least squares; conceives the 
Gaussian law of distribution in the theory of probability. 

1795: (only 18 years old!) Proves that a regular polygon with n sides is constmctible (by ruler 
and compass) if and only if n is the product of a power of 2 and distinct prime numbers of the 
form pk = 2 Z +1 ， and completely solves the 2000-year old problem of ruler-and-cornpass 
construction of regular polygons. He also discovers the law of quadratic reciprocity. 

1799: Proves the fundamental theorem of algebra in his doctoral dissertation using the 
then-mysterious complex numbers with complete confidence. 

1801: Gauss publishes his Disquisitiones Arithmeticae in which he creates the modern rig¬ 
orous approach to mathematics; predicts the exact location of the asteroid Ceres. 

1807: Becomes professor of astronomy and the director of the new observatory at Gottingen. 
1809: Publishes his second book ，Theoria motus corporum coelestium，a major two-volume 
treatise on the motion of celestial bodies and the bible of planetary astronomers for the next 
100 years. 

1812: Publishes Disquisitiones generates circa seriem infinitam t a rigorous treatment of 
infinite series, and introduces the hypergeometric function for the first time, for which he 
uses the notation F(a, y; z); an essay on approximate integration. 

1820*1830: Publishes over 70 papers, including Disquisitiones generates circa superficies 
curvasy in which he creates the intrinsic differential geometry of general curved surfaces, 







416 14. COMPLEX ANALYSIS OF SOLDES 


the forerunner ofRiemannian geometry and the general theory of relativity. From the 1830s 
on, Gauss was increasingly occupied with physics, and he enriched every branch of the 
subject he touched. In the theory of surface tension, he developed the fundamental idea of 
conservation of energy and solved the earliest problem in the calculus of variations. In op¬ 
tics, he introduced the concept of the focal length of a system of lenses. He virtually created 
the science of geomagnetism, and in collaboration with his friend and colleague Wilhelm 
Weber he invented the electromagnetic telegraph. In 1839 Gauss published his fundamental 
paper on the general theory of inverse square forces, which established potential theory as 
a coherent branch of mathematics and in which he established the divergence theorem. 

Gauss had many opportunities to leave Gottingen, but he refused all offers and remained 
there for the rest of his life, living quietly and simply, traveling rarely, and working with 
immense energy on a wide variety of problems in mathematics and its applications. Apart 
from science and his family — he married twice and had six children, two of whom emigrated 
to America~his main interests were history and world literature, international politics, and 
public finance. He owned a large library of about 6000 volumes in many languages, including 
Greek, Latin, English, French, Russian, Danish, and of course German. His acuteness in 
handling his own financial affairs is shown by the fact that although he started with virtually 
nothing, he left an estate over a hundred times as great as his average annual income 
during the last half of his life. The foregoing list is the published portion of Gauss’s total 
achievement; the unpublished and private part is almost equally impressive. His scientific 
diary, a little booklet of 19 pages, discovered in 1898, extends from 1796 to 1814 and 
consists of 146 very concise statements of the results of his investigations, which often 
occupied him for weeks or months. These ideas were so abundant and so frequent that he 
physically did not have time to publish them. Some of the ideas recorded in this diary: 
Cauchy Integral Formula: Gauss discovers it in 1811,16 years before Cauchy. 

Non ■Euclidean Geometry: After failing to prove Euclid’s fifth postulate at the age of 15, 
Gauss came to the conclusion that the Euclidean form of geometry cannot be the only one 
possible. 

Elliptic Functions: Gauss had found many of the results of Abel and Jacobi (the two main 
contributors to the subject) before these men were bom. The facts became known partly 
through Jacobi himself. His attention was caught by a cryptic passage in the Disquisitiones, 
whose meaning can only be understood if one knows something about elliptic functions. He 
visited Gauss on several occasions to verify his suspicions and tell him about his own most 
recent discoveries, and each time Gauss pulled 30-year-old manuscripts out of his desk and 
showed Jacobi what Jacobi had just shown him. After a week’s visit with Gauss in 1840, 
Jacobi wrote to his brother, “Mathematics would be in a very different position if practical 
astronomy had not diverted this colossal genius from his glorious career.” 

A possible explanation for not publishing such important ideas is suggested by his 
comments in a letter to Bolyai: “It is not knowledge but the act of learning, not possession 
but the act of getting there, which grants the greatest enjoyment. When I have clarified and 
exhausted a subject, then I turn away from it in order to go into darkness again.” His was 
the temperament of an explorer who is reluctant to take the time to write an account of his 
last expedition when he could be starting another. As it was, Gauss wrote a great deal, but 
to have published every fundamental discovery he made in a form satisfactory to himself 
would have required several long lifetimes. 








14.4 THE HYPERGEOMETRIC FUNCTION 417 


A third relation can be obtained by making the substitution u; = (1 —z) y ~ a ~^u. 
This leads to a hypergeometric equation for w with a\ = y — a, = y — and 
y\ — y. Furthermore, w is analytic at z = 0, and u;(0) = 1. We conclude that 
w = F(af, y; z). We therefore have the identity 

F(a, y\z) = (1 - z) v ~ a ~^F{y 一 a ， ]/ — }/; z). (l425) 

To obtain the canonical basis atz = 1, we make the substitution t = l — z 9 and 
note that the Tesult is again the HGDE, with ot\ =a } fi\ = fi,andy\ = a-\-fi—y-\-l. 
It follows from Equation (14.24) that 

m{z) = F(q?,^;q!+^ -y 4 - 1 ； 1 — z )， 

W4(z) = (1 - z) y ~ a ~^F(y - ^ 9 y - a;y - a - fi + Ul-z) 

(14.26) 


form a canonical basis of solutions to the HGDE at z = 1. 

A symmetry of the hypergeometric function that is easily obtained from the 
HGDE is ^ 


F(<oc 9 ^ V\z) = F(p,a; y; z). (1427) 

The six functions 


F(a ± 1, y; z), 士 1; }/;z )， F(«, y±l\z) ， 

are called hypergeometric functions contiguous to F(a, y; z). The discussion 
above showed how to obtain the basis of solutions at z = 1 from the regular 
solution to the HDE z = 0, F(a, y; z). We can show that the basis of solutions 
atz = oo can also be obtained from the hypergeometric function. 

Equation (14,16) suggests a function of the form 


11 1 
v(z) = z r F^a u jSi ； yi ； =z r w w(z) = z r v , 


(14.28) 


where and/i are to be determined. Since w(z) is a solution of the HGDE, 

v will satisfy the following DE (see Problem 14.15): 


z(l- z)v ff -^[l-a-p- : 2r-(2-y-2r)zW 


r 2 — r ry - (r + a)(r + 卢 )] v = 0. 


(14.29) 


This reduces to the HGDE if r = —o? or r = ~p. For r = —a, the parameters 
become = a, fix = 1 -\-a — y, and/i = a —^ + 1. For r = — 芦 ， the parameters 
are 叫 = 月 , = 1 + 0 — y，and yi = 月 一 a + 1. Thus, 


vi(z) = z~ a F (a，1 + a - y;a — 卢 + 1; 吾)， 

V 2 (z) = z - 戶 F ( 卢 ， 1 + 卢 一 y; 0 — a + 1; —) 


(14.30) 




418 14. COMPLEX ANALYSIS OFSOLDES 


Kummer's solutions 


Jacobi functions 


form a canonical basis of solutions for the HGDE that are valid about z = oo. 

As the preceding discussion suggests, it is possible to obtain many relations 
among the hypergeometric functions with different parameters and independent 
variables. In fact, the nineteenth-century mathematician Kummer showed that there 
are 24 different (but linearly dependent, of course) solutions to the HGDE. These 
are collectively known as Kummer，s solutions, and six of them were derived 
above. Another important relation (shown in Problem 14.16) is that 


z^il-zV-^F^y 


—a,l — a ',1 — 


V 


(14.31) 


also solves the HGDE. 

Many of the functions that occur in mathematical physics are related to the 
hypergeometric function. Even some of the common elementary functions can be 
expressed in terms of the hypergeometric function with appropriate parameters. 
For example, when 卢 =y ， we obtain 


F(a, j8; /3; z) 


r (a? + a ：) 

⑻ r(A + i) z 




(i-zy a . 


Similarly, \\ z 2 ) = sin -1 z/z, and ,F(1,1; 2; -z) = ln(l + z)/z. How¬ 

ever, the real power of the hypergeometric function is that it encompasses almost 
all of the nonelementary functions encountered in physics. Let us look briefly at a 
few of these. Jacobi functions are solutions of the DE 

o d^u du 

(1 — x ) — 2 + [p 一 a 一 （05 + 0 + 2 )j ^]-： — h A (A. + a + ^ + 1 )m — 0 

dx dx (14.32) 


Defining x = 1 —2z changes this DE into the HGDE with parameters a\ = k, 
= A + a + j6 + l，and yi = 1 + a. The solutions of Equation (14.32)，called 
the Jacobi functions of the first kind, are, with appropriate normalization, 


^\z) 


ra + oj + i) 


r ( 入 + i)r> + i) v 


F ( — 义，又 + Of + )S + l;l+Qf; -— ) ■ 


2 


When 入 ： = «， a nonnegative integer, the Jacobi function turns into a polynomial 
of degree n with the following expansion: 


P^\z ) - 


r(/Z + 0f + l) + 1) /z — 1 

r(n + i)r(«H-a+>+i) ^ ~r>+I+T)~ 


These are the Jacobi polynomials discussed in Chapter 7. In fact, the DE satisfied by 
P^^ix) of Chapter 7 is identical to Equation (14.32). Note that the transformation 
x = \ — 2z translates the points 2 = 0 and z = 1 to the points x = 1 and jc = —1, 
respectively. Thus the regular singular points of the Jacobi functions of the first 
kind are at ±1 and oo. 








14.5 CONFLUENT HYPERGEOMETRIC FUNCTIONS 419 


Gegenbauer 

functions 


Legendre functions 


A second, linearly independent, solution of Equation (14.32) is obtained by 
using (14.31). These are called the Jacobi functions of the second kind: 



2 入 +Qf Wr (入 + cy + i)r (入 + 卢 +1) 

r(2 入 + a + 卢 + 2 )(Z - l)x +a+1 (z + 1 )’ 


F (入 + a + 1，久 + 1; 2 入 + a + 卢 + 2; 



(14.33) 


Gegenbauer functions, orultraspherical functions, are special cases of Jacobi 
functions for which a = 々 =/i — ^ ■ They are defined by 




r (入 + 2fi) ? 

r ( 入 + i)r(2/x) 


， i i — 

一入，入 + 2/ X ； fl + 2； 2 " / ■ 


(14.34) 


Note the change in the normalization constant. Linearly independent Gegenbauer 
functions “of the second kind” can be obtained from the Jacobi functions of the 
second kind by the substitution a = 月 = : /x — 壶 . Another special case of the Jacobi 
functions is obtained when a = ^ = 0. Those obtained from the Jacobi functions 
of the first kind are called Legendre functions of the first kind: 


P^(z) = Pl m (z) = Cl /2 = F (-X, + (14.35) 


Legendre functions of the second kind are obtained from the Jacobi functions of 
the second kind in a similar way: 

Q k (z) = — 2 ^ r2 ^_ + _ 1 > F (入 + 1 ，人 + 1; 2 入 + 2; 

似 ’ r(2 久 + 2)(z — l )^ 1 V \-z) 

Other functions derived from the Jacobi functions are obtained similarly (see Chap¬ 
ter 7). 


14,5 Confluent Hypergeometric Functions 


The transformation x = \-2z translates the regular singular points of the HGDE 
by a finite amount. Consequently, the new functions still have two regular singular 
points, z - 士 1， in the complex plane. In some physical cases of importance, 
only the origin, corresponding to r = 0 in spherical coordinates (typically the 
location of the source of a central force), is the singular point. If we want to obtain 
a differential equation consistent with such a case, we have to “push” the singular 
point z = lto infinity. This can be achieved by making the substitution t = rzm 
the HGDE and taking the limit r — oo. The substitution yields 


d 2 w 

~d? 



1 - y +« + 
t — r 


dw 

dt 


ap 

tit - r) 



(1436) 


420 14. COMPLEX ANALYSIS OF SOLOES 


confluent 
hypergeometric DE 


confluent 
hypergeometric 
function and series 


hydrogen-like atoms 


If we blindly take the limit r — oo with a, and y remaining finite, Equation 
(14.36) reduces to u? + (y/t)w = 0, an elementary FODE in w. To obtain a 
nonelementary DE, we need to manipulate the parameters, to let some of them 
tend to infinity. We want y to remain finite, because otherwise the coefficient of 
dw/dt will blow up. We therefore let ^ or a tend to infinity. The result will be 
the same either way because a and ^ appear symmetrically in the equation. It is 
customary to let ^ = r ^ oo. In that case. Equation (14.36) becomes 


d 2 

~dr 


w 


+ 






Multiplying by t and changing the independent variable back to z yields 

zw f \z) + (y - z)w , (z) - aw(z) =0. (14.37) 

This is called the confluent hypergeometric DE (CHGDE). 

Since z = 0 is still a regular singular point of the CHGDE, we can obtain 
expansions about that point. The characteristic exponents are 0 and 1 — y, as before. 
Thus, there is an analytic solution (corresponding to the characteristic exponent 
0) to the CHGDE at the origin, which is called the confluent hypergeometric 
function and denoted by z)- Since z = 0 is the only possible (finite) 

singularity of the CHGDE, ^(a; y; z) is an entire function. 

We can obtain the series expansion of <D(o:; y; z) directly from Equation 
(14.22) and the fact that 0(a; y;z) = lim^^ 0 y; z/P). The result is 


^(o ?； y\z )= 


£(y) 

rw 


oo 


E 


r(a+k) 

WTW(yTk) z 


(14.38) 


This is called the confluent hypergeometric series. An argument similar to the 
one given in the case of the hypergeometric function shows that 


14.5.1* Box. The confluent hypergeometric function ^>(or; y; z) reduces to 
a polynomial when a is a negative integer. 


A second solution of the CHGDE can be obtained, as for the HGDE. If 1 一 y is 
not an integer, then by taking the limit 々 — oo of Equation (14.24)，we obtain the 
second solution ^ 1_>/ —/ + 1, 2 — y; z). Thus, any solution of the CHGDE can 

be written as a linear combination of 0(a; y\ z) and^ 1_}/ <I>(a — y ^-1,2 —y\ z). 

14.5.2. Example. The time-independent Schriidinger equation for a central potential, in 
units in which 荇 =m = 1， is —士 V 2 少 + V(r)^ = E^. For the case of hydrogen-like 
atoms, ^(r) = —Ze 2 /r, where Z is the atomic number, and the equation reduces to 


▽ 2 屮 + 





14.5 CONFLUENT HYPERGEOMETRIC FUNCTIONS 421 


The radial part of this equation is given by Equation (12.14) with f(r) = 2£ + 2Ze 2 /r. 
Defining u = ri?(r), we may write 

会 + «-$)« = 0 ， (14.39) 

where 入 = 2E, a = 2Ze 2 , and /? = /(/ + 1). This equation can be further simplified by 
defining / ■ 三 /：r (/: is an arbitrary constant to be determined later): 

d 2 u /. ,9 cik b\ 

— 7r [Xk H - - - 1 m = 0. 

dz 2 v z z" 

Choosing 入众 2 = :—去 and introducing a = yields 


quantization of the 
energy of the 
hydrogen atom 


d 2 u 

dz 2 


/ 1 a b \ 

(■4 + 7~?) w 


0. 


Equations of this form can be transformed into the CHGDE by making the substitution 
u(z) = z^e~ vz f(z)- It then follows that 


d2 f ■ 




/ dz 


+ 


-4 + 


fi(ji — V) 2fiv 

z 2 Z 


a b r 

- ^ + v ‘ 

Z Z z 


/ = 0 , 


Choosing v 2 = | and /x(/x —1) = 6 reduces this equation to 




0, 


which is in the form of (14.37). 

On physical grounds, we expect u(z) - 0 as z oo. 5 Therefore, v = j- Similarly, 
with — 1) b = 1(1 1), we obtain the two possibilities ijl — —l and /x = / + 1. 

Again on physical grounds, we demand that w(0) be finite (the wave function must not blow 
up at r = 0). This implies 6 that jii = / + 1. We thus obtain 


/" + 


2(Z + 1) 


f- 


Z + l 


-/ = 0. 


Multiplying by z gives zf + [2( / + 1) - z\f 一 (/ 十 1 - a)/ = 0, Comparing this with 
Equation (14.37) shows that / is proportional to 0( / +1 — a, 2? + 2; z). Thus, the solution 
of (14.39) can be written as 


u{z) = Cz M e-^ 2 ^{l + l-a,2/H-2; z). 

An argument similar to that used in Problem 13.20 will reveal that the product 
e -V 2 o(/+l-a, 2/+2; z) will be infinite unless the power series representing O terminates 
(becomes a polynomial). It follows from Box 14.5.1 that this will take place if 


/ + 1 - a = -iV 


(14.40) 


5 This is because the volume integral of | 少 |2 overall space must be finite. The radial part of this integral is simply the integral 
of r 2 R 2 (r) = u 2 (r). This latter integral will not be finite unless m(oo) = 0. 

^Recall that ^ is the exponent of z = r/ k. 


422 14. COMPLEX ANALYSIS OF SOLDES 


for some integer > 0. In that case we obtain the Laguerte polynomials 


l N^ 


rqv + j + i) 
r(iv + i)rc/ + i) 


j + 1 ; z\ 


where j = 2/ + 1. 


Condition (14.40) is the quantization rule for the energy levels of a hydrogen-like 
atom. Writing everything in terms of the original parameters and defining w = 况 + / + 1 
yields — after restoring all the m’s and the IVs — the energy levels of a hydrogen-like atom: 



Z 2 me 4 

2h^n 2 



where a = e^/ihc) = 1/137 is the fine structure constant. 
The radial wave functions can now be written as 


Rn,l(r) = ^l 1 = Cr l e -Zr/(na 0 )^ ( 一 „ + ， + [ 2/ + 2 ; 盖)， 
where oq = h 2 /(me 2 ) = 0.529 x 10~ 8 cm is the Bohr radius. 


M 


Friedrich Wilhelm Bessel (1784—1846) showed no signs of 
unusual academic ability in school, although he did show a 
liking for mathematics and physics. He left school intending 
to become a merchant’s apprentice, a desire that soon mate¬ 
rialized with a seven-year unpaid apprenticeship with a large 
mercantile firm in Bremen. The young Bessel proved so adept 
at accounting and calculation that he was granted a small salary, 
with raises, after only the first year. An interest in foreign trade 
led Bessel to study geography and languages at night, astonish¬ 
ingly learning to read and write English in only three months. 

He also studied navigation in order to qualify as a cargo officer 
aboard ship, but his innate curiosity soon compelled him to investigate astronomy at a more 
fundamental level Still serving his apprenticeship, Bessel learned to observe the positions 
of stars with sufficient accuracy to determine the longitude of Bremen, checking his results 
against professional astronomical journals. He then tackled the more formidable problem 
of determining the orbit of Haltey’s comet from published observations. After seeing the 
close agreement between Bessel’s calculations and those of Halley, the German astronomer 
Olbers encouraged Bessel to improve his already impressive work with more observations. 
The improved calculations, an achievement tantamount to a modem doctoral dissertation ， 
were published with Olbers’s recommendation. Bessel later received appointments with 
increasing authority at observatories near Bremen and in Konigsberg, the latter position 
being accompanied by a professorship. (The title of doctor, required for the professorship, 
was granted by the University of Gottingen on the recommendation of Gauss.) 

Bessel proved himself an excellent observational astronomer. His careful measurements 
coupled with his mathematical aptitude allowed him to produce accurate positions for a 
number of previously mapped stars, taking account of instrumental effects, atmospheric 
refraction, and the position and motion of the observation site. In 1820 he determined the 
position of the vernal equinox accurate to 0.01 second, in agreement with modem values. 






14.5 CONFLUENT HYPERGEOMETRIC FUNCTIONS 423 


Bessel differentia! 
equation 


His observation of the variation of the proper motion of the stars Sirius and Procyon led 
him to posit the existence of nearby, large, low-luminosity stars called dark companions. 
Between 1821 and 1833 he catalogued the positions of about 75,000 stars, publishing his 
measurements in detail. One of his most important contributions to astronomy was the 
detemiination of the distance to a star using parallax. This method uses triangulation, or the 
determination of the apparent positions of a distant object viewed from two points a known 
distance apart, in this case two diametrically opposed points of the Earth’s orbit. The angle 
subtended by the baseline of Earth’s orbit, viewed from the star’s perspective, is known 
as the star’s parallax. Before Bessel’s measurement, stars were assumed to be so distant 
that their parallaxes were too small to measure, and it was further assumed that bright stars 
(thought to be nearer) would have the largest parallax. Bessel correctly reasoned that stars 
with large proper motions were more likely to be nearby ones and selected such a star, 61 
Cygni, for his historic measurement. His measured parallax for that star differs by less than 
8% from the currently accepted value. 

Given such an impressive record in astronomy, it seems only fitting that the famous 
functions that bear Bessel’s name grew out of his investigations of perturbations in planetary 
systems. He showed that such perturbations could be divided into two effects and treated 
separately: the obvious direct attraction due to the perturbing planet and an indirect effect 
caused by the sun’s response to the perturber^s force. The so-called Bessel functions then 
appear as coefficients in the series treatment of the indirect perturbation. Although special 
cases of Bessel functions were discovered by Bernoulli, Euler, and Lagrange, the systematic 
treatment by Bessel clearly established his preeminence, a fitting tribute to the creator of 
the most famous functions in mathematical physics. 


14,5.1 Bessel Functions 

The Bessel differential equation is usually written as 

As in the example above, the substitution w = z^e^fiz) transforms (14.41) 
into 

乌 + ( 吐 - 2 ") 竽 + + ? + 

dz 2 v ^ J d z L ^ 

which, if we set ^ = v and t] = i, reduces to 

Making the further substitution 2iz = t, and multiplying out by t, we obtain 

t ^ ( + (2v + 1 - - (v + 善 ) / 二 0 ， 




w 


0 


(14.41) 




424 14. COMPLEX ANALYSIS OFSOLDES 


Bessel function of the 
first kind 


Bessel function of the 
second kind，or 
Neumann function 


which is in the form of (14.37) with a = v + | and y = 2v + 1. 

Thus, solutions of the Bessel equation. Equation (14.41)，can be written as 
constant multiples of z v e^ lz ^(y + 2v + l; 2iz). With proper normalization, we 
define the Bessel function of the first kind of order v as 

Mz)= 办 1 +1) ( 0 " e~ iz <P(v + 夂 2v + 1; 2iz). (14.42) 


Using Equation (14.38) and the expansion for e^" iz , we can show that 


k=：0 


(-iy 


A ： !F(v + 灸 + 1 ) 



(14.43) 


The second linearly independent solution can be obtained as usual and is propor¬ 
tional to 


之 i-( 2 v+l) (|) v e -i z ^ {v + 1 ^ ( 2 v + l) + l ? 2 -( 2 y + l); 2 ^) 

=C ( 3 ) e~ iz ^>{—v + I ， —2v + 1; 2iz) ― CJ- v (z) t 

provided that 1 — y = 1 — (2v +1) = —2v is not an integer. When v is an integer, 
J-n(z) = (—l) n J n (z) (see Problem 14.25). Thus，when visa noninteger, the most 
general solution is of the form. AJ v {z) + BJ- v (z). 

How do we find a second linearly independent solution when v is an integer 
«? We first define 


Y v (z ) - 


J v (z) cos vn — J- V (z) 
sin vit 


(14.44) 


called the Bessel function of the second kind, or the Neumann function. For 
noninteger v this is simply a linear combination of the two linearly independent 
solutions. For integer v the function is indeterminate. Therefore, we use l ， H 6 pital，s 
rule and define 


Y n {z) = lim Y v (z) 


-lim [^-(-l) n ^ 

7 T v-^n L dv dv - 


Equation (14.43) yields 


dJ v 

"aiT 


= 4 ⑵ ln (!) 一 产 


^(v+/:+!) 

W(^TTi) 



where 少 (z) = (d/dz) In T(z). Similarly, 


dJ- v 


/ 一冶 )In (•) + (■) 言 


屮(一 V +&+1) 
灸! r ( 一 v + 灸 +1) 




14.5 CONFLUENT HYPERGEOMETRIC FUNCTIONS 425 


Bessel function of the 
third kind or Hankel 
function 


Substituting these expressions in the definition of Y n {z) and using J~ n (z)= 
(-l) n J n (z), we obtain 




少 (ft + fc + l) /Z\ 2k 
抓 (” + 灸 +1) \2/ 




(14.45) 


The natural log term is indicative of the solution suggested by Theorem 14.2.6. 
Since Y v (z) is linearly independent of J v (z) for any v, integer or noninteger，it is 
convenient to consider {/ v (z), Y v (z)}asa basis of solutions for the Bessel equation. 
Another basis of solutions is defined as 


H^(z) = Mz) + iYv{z\ H^iz) - Mz) - iY v (z), 


(14.46) 


which are called Bessel functions of the third kind, or Hankel functions. 
Replacing z by iz in the Bessel equation yields 


d 2 w 1 dw 
dz 2 ^ z dz 


l + ^|u; = 0, 

z l 


whose basis of solutions consists of multiples of J v (iz) and Thus, the 

modified Bessel functions of the first kind are defined as 


I v (z)^e- i ^ 2 J v (iz) = (^) V f^ 




ho ^(V + ^H-I) V2 
Similarly, the modified Bessel functions of the second kind are defined as 

7t 


K v (z) 


2 sin V7r 


[I- v {z) - I v (z)l 


When vis an integer n, I n = I- n , and K n is indeterminate. Thus, we define K n {z) 
as K v (z). This gives 




2 v->n 


L dv 


dv - 


which has the power-series representation 

Kn(z) = (-lf +1 / n (^ln(|) + 蠢 (-ir (|) n 


00 屮 (打 + fc+l) (Z\^ 


+ ㈣ rs 


k=0 kir(n^k-^l) \2 

00 屮 (ifc — n + l) /z\ 2k 


k\r(k-n + l) \2 


426 14. COMPLEX ANALYSIS OF SOLDES 


recurrence relation 
for solutions of the 
Bessel DE 


We can obtain a recurrence relation for solutions of the Bessel equation as 
follows. If Z v (z) is a solution of order v, then (see Problem 14.28) 


z v+1 = c^|u-z v (,)] 


and Z y _i 




If the constants are chosen in such a way that Z v , Z_ v , Z v +i，and Z v _i satisfy 
their appropriate series expansions, then C\ = —1 and C 2 = 1. Carrying out the 
differentiation in the equations for Z v+ \ and Z v -\ 9 we obtain 


7 v dZ v 
Zv+1 = ~z Zv ~^ 


Z v _! = -Z v + ^ 
Z dz 


(14.47) 


Adding these two equations yields the recursion relation 


Z v -\(z) + Z v+ i(z) = — Z v (z), 

z 

where Z v (z) can be any of the three kinds of Bessel functions. 


(14.48) 


14.6 Problems 


14.1. Show that the solution of w f + w/z 2 = 0 has an essential singularity at 
z = 0. 


14.2. Derive the recursion relation of Equation (14.7) and express it in terms of 
the indicial polynomial, as in Equation (14.9). 

14.3. Find the characteristic exponent associated with the solution of w ,r 4- 
p(z)w f -\-q(z)w = Oat an ordinary point [a point at which p(z) and q(z) have no 
poles]. How many solutions can you find? 


14_4. The Laplace equation in electrostatics when separated in spherical coordi¬ 
nates yields a DE in the radial coordinate given by 


d_ 

dx 


x 4 


— n(n + l)；y = 0 for n > 0. 


Starting with an infinite series of the form (14.6), show that the two independent 


solutions of this ODE are of the form x n and x~ n ~ l . 


14.5. Find the indicial polynomial, characteristic exponents, and recursion relation 
at both of the regular singular points of the Legendre equation. 


w 


2z 


a 


1 — P 




1-z 2 


w = 0, 


What is ak, the coefficient of the Laurent expansion, for the point z = +1? 




14.6 PROBLEMS 427 


14.6. Show that the substitution ^ = l/t transforms Equation (14.13) into Equa¬ 
tion (14.14). 

14.7. Obtain the indicial polynomial of Equation (14.14) for expansion about 
^ — 0 . 

14.8. Show that Riemann DE represents the most general second order Fuchsian 
DE. 


14.9. Derive the indicial equation for the Riemann DE. 

14.10. Show that the transformation v(z) = z k (z - l)^w(^) changes the pairs of 
characteristic exponents (入 1 ，入 2 ) ，（ Ml ， M2) ， 犯 id (vi, V 2 ) for the Riemann DE to 
(入 1 + 入，入 2 + 入 )， （Mi + M ， M2 + M)，(vi — A — V 2 — ^ — /x). 

14.1 L Go through the steps leading to Equations (14.24), (14.25), and (14.26). 

14.12. Show that the elliptic function of the first kind, defined as 

严 / 2 d0 

^(z) = / -====, 

Jo y/l — z 2 sin 2 0 

can be expressed as (jr/2)/ 7 ( 姜， 1; ^ 2 ). 

14.13. By differentiating the hypergeometric series, show that 


d n 

dz n 


F(os ， 0; y;z )= 


T(a+n)r(P+n)r(y) 

r(a)T^)r(y+n) 


F(a -\-n, p -\-n; y + w;z). 


14.14. Use direct substitution in the hypergeometric series to show that 

= (l + z) ff , |;z 2 ) = - sin 一 1 z ， 

z 

F(l,l;2; -z) = -ln(l + z). 
z 

14.15. Show that the substitution v(z) = z r w(l/z) [see Equation (14.28)] trans¬ 
forms the HGDE into Equation (14.29). 

14.16. Consider the function v(z) = z r (l - z) s F(oii, ^i ； n ； 1A) and assume 

that it is a solution of HGDE. Find a relation among r,s, andyi such that 

v(z) is written in terms of three parameters rather than five. In particular, show 
that one possibility is 

v(z)= z a - y (l - z) y - a ^F(y-a,l-a;l+jS-a; 1/z). 


Find all such possibilities. 



428 14. COMPLEX ANALYSIS OFSOLDES 


spherical Bessel 
functions 


14.17. Show that the Jacobi functions are related to the hypergeometric functions. 

14.18. Derive the expression for the Jacobi function of the second kind as given 
in Equation (14.33). 

14.19. Show that z = oo is not a regular singular point of the CHGDE. 

14.20. Derive the confluent hypergeometric series from hypergeometric series. 

14.21. Show that the Weber-Hermite equation, u ff + (y-\- \ — \z 2 )u — 0 can be 
transformed into the CHGDE. Hint: Make the substitution w(z) = cxp(—^z 2 )v(z). 

14.22. The linear combination 

r(l - y) r(y - 1) 1 

y; z) = — - —0(a, y; z) + z l ~ y ^(a -y + 1,2-y;z) 

r(a - y H- 1) r(a) 

is also a solution of the CHGDE. Show that the Hermite polynomials can be written 
as 


Hn (j^) =2nn ~^b Z 2 ) ' 


14.23. Verify that the error function erf(z) = /q e~ t2 dt satisfies the relation 

erf(z) = ( 士 , I; -z 2 ). 

14.24. Derive the series expansion of the Bessel function of the first kind from 
that of the confluent hypergeometric series and the expansion of the exponential. 
Check your answer by obtaining the same result by substituting the power series 
directly in the Bessel DE. 


14.25* Show that J— n (z) = (—l) n J n (z), Hint: Let v = —n in the expansion of 
Jv(.z) and use r(m) = oo for a nonpositive integer m. 

14.26. In a potential-free region, the radial part of the Schrodinger equation re¬ 
duces to 


d 2 R 2 dR 


入 


a 

F 」 


R 



Write the solutions of this DE in terms of Bessel functions. Hint: Substitute R ^ 
u/^/r. These solutions are called spherical Bessel functions. 

14.27* Theorem 14.2.6 states that under certain conditions, linearly independent 
solutions of SOLDE at regular singular points exist even though the difference 
between the characteristic exponents is an integer. An example is the case of 
Bessel functions of half-odd-integer orders. Evaluate the Wronsldan of the two 
linearly independent solutions, J v and /_ v ,ofthe Bessel equation and show that it 
vanishes only ifv is an integer. This shows, in particular, that J n +i/2 and J- n -\/2 
are linearly independent. Hint: Consider the value of the Wronskian at z = 0, and 
use the formula r(v)r(l -v) = Tt/smvn. 







14.6 PROBLEMS 429 


14.28. Show that 之 士 1 ⑵] is a solution of the Bessel equation of 
order v 士 1 if Z v is a solution of order v. 

14.29. Use the recursion relation of Equation (14.47) to prove that 
(- z ~) m lz v Z v (z)] = z v - m Z v . m (z), 

(臺去 ) W U ~ VZv(z)] = (-ir 2 - v - m Z v+ra fe). 

14.30. Using the series expansion of the Bessel function, write J\/ 2 (z) and 
y_l/2(z) in terms of elementary functions. Hint: First show that F(k + |)= 

14.31. From the results of the previous two problems，derive the relations 

一鲁％薪)， 

一卜 (4 體). 

14.32. Obtain the following integral identities: 

(a) J z vJrX Jv{z)dz — z v+l Jv-^\(z). 

(b) j z~ v+l Mz)dz = -z^ 1 J v -i(z). 

(C) J z fl ^Jv(z)dz = z^ l J v ^i(z) + (M — v)z fl Jv(z) 

- (JI 2 - V 2 )jz^~ l j v (z) dz, 

and evaluate 

(d) J z 3 Jo(z)dz^ 

Hint: For (c) write z^ 1 = and use integration by parts. 

14.33. Use Theorem 14.2.6 and the fact that J n (z) is entire to show that for integer 
n, a second solution to the Bessel equation exists and can be written as Y n (z )= 
Jn{z)Vfn{z) + C n lnz]，where f n (z) is analytic about z = 0. 

14.34 •⑻ Show that the Wronskian W(J V , Z; z) of J v and any other solution Z 
of the Bessel equation, satisfies the equation 

4~[zW(J Vi Z; z)] = 0. 



430 1 4. COMPLEX ANALYSIS OFSOLDES 


(b) For some constant A, show that 

d rZi = W(z) ^ A 
~dz IJ^i = J^(z) ^ zJ^izY 

(c) Show that the general second solution of the Bessel equation can be written as 
Z Az) = Mz )[b + aJ-^'_. 


14.35. Spherical Bessel functions are defined by 


Let fi(z) denote a spherical Bessel function “of some kind.” By direct differenti¬ 
ation and substitution in the Bessel equation, show that 


⑷ ^[z w fi(z)] = z w fi-i(z). ib) ^[z-7/fe)] = -z~ l f M (z). 

dz dz 

(c) Combine the results of parts (a) and (b) to derive the recursion relations 

+ fw(z) - ⑵， Ifi 一似 -(/ + 1)//+1« = ( 2 / + D^. 

z dz 

14.36. Show that W(J v ,Y v \z) = 2/(ttz) 9 W(H y (1) , H ^; z) = 4/(/ttz). Hint: 
Use Problem 14.34, 


14.37. Verify the following relations: 


⑷ Y ni . 1/2 (z) = (-l) rt+1 / _ n _i /2 (z) ? Y- n . 1/2 (z) = (-l)V 朴 1/2 (z). 

A、V / 、 . r / x , V / 、 Mz)-cos vnj- v (z) 

(b) Y^ v (z) — smV7r/ v ( 之 ) + cosvtt/vU)= -:-- 

sin v 丌 

(c) Y- n (z) = (—l) n Y n (z) in the limit v ^ n in part (b). 

14.38, Use the recurrence relation for the Bessel function to show that J\(z )= 

-你) • 


14.39. Let u = J v {Xz) and v = J v (jiz). Multiply the Bessel DE for ubyv/z and 
that of v by u/z. Subtract the two equations to obtain 


( 入 2 — ji 2 )zuv = 


d 

dz 


■ / dv du\ 
Z \~dz~ V ~dd 


(a) Write the above equation in terms of J v (kz) and J v (fiz) and integrate both 
sides with respect to z. 




14.6 PROBLEMS 431 


(b) Now divide both sides by X 2 — (j? and take the limit as 从 — 入 . You will need 
to use L’H6pital’s rule. 

⑹ Substitute for J^kz) from the Bessel DE and simplify to get 

f z[Jv^z)fdz - y + (1 — [Mm 2 ■ 

(d) Finally, let X = x vn /a, where x vn is the nth root of J v , and use Equation (14.47) 
to arrive at 

14.40. The generating function for Bessel functions of integer order is exp[^(r— 
1/0]. To see this, rewrite the generating function as e zt ^ 2 e~ z ^ 2t , expand both 
factors, and write the product as powers of t n . Now show that the coefficient 
of t n is simply J n (j). Finally, use J~ n (z) = (-l) n J n (z) to derive the formula 
exp[^(r - 1/01 = E^l-oo J n(z)t n . 

14.41. Make the substitutions z = fit y and w = t a u to transform the Bessel DE 

ry 

into + (2a + 1 穴年 + (^ 2 y 2 t 2y -ha 2 — v 2 y 2 )u = 0. Now show that 
dt 1 at 

Airy’s differential equation u — tu ^ 0 has solutions of the form /i/ 3 (f k 3 ’ 2 ) 
and y_i/ 3 (|i7 3/2 ). 

• d 2 w e 2t 一 v 2 

14.42. Show that the general solution of - ~1 - — w — 0 is w = 

14.43* Transform dw/dz w 2 z m = 0 by making the substitution w = 
(d/dz) Inu. Now make the further substitutions 

v = Uy/z and t = — 

5 m + 2 

to show that the new DE can be transformed into a Bessel equation of order 
l/(m + 2). 

14.44. Starting with the relation 

exp[^( / - l/0]exp[^>?(^ - 1/0] = exp[|(x + - 1/0] 

and the fact that the exponential function is the generating function for J n (z) 9 prove 
the “addition theorem” for Bessel functions: 

00 

J n (x + y)= ^2 Jk(x)J n -k(y). 



432 14. COMPLEX ANALYSIS OF SOLOES 


Additional Reading 

1. Birkhoff, G. and Rota, G.-C. Ordinary Differential Equations, 3rded M Wiley, 
1978. The first two sections of this chapter closely follow their presentation. 

2. Dennery, P. and Krzy wicki, A. Mathematics for Physicists, Harper and Row, 
1967. 

3. Watson, G. A Treatise on the Theory of Bessel Functions ， 2nd ed., Cam¬ 
bridge University Press, 1952. As the name suggests, the definitive text and 
reference on Bessel functions. 




Integral Transforms and Differential 
Equations 


kernel of integral 
transforms 

examples of integral 
transforms 


The discussion in Chapter 14 introduced a general method of solving differential 
equations by power series — also called the Frobenius method — which gives a 
solution that converges within a circle of convergence. In general, this circle of 
convergence may be small; however, the function represented by the power series 
can be analytically continued using methods presented in Chapter 11. 

This chapter, which is a bridge between differential equations and operators 
on Hilbert spaces (to be developed in the next part), introduces another method of 
solving DEs, which uses integral transforms and incorporates the analytic con¬ 
tinuation automatically. The integral transform of a function v is another function 
u given by 

U ( Z )= J^K(z, t)v{t)dt, (15.1) 

where C is a convenient contour, and K(z, t), called the kernel of the integral 
transform, is an appropriate function of two complex variables. 

15.0.1. Example* Let us consider some examples of integral transforms. 

(a) The Fourier tiansform is familiar from tiie discussion of Chapter 8. The kernel is 

K(x t y) = e ix y. 

(b) The Laplace transform is used frequently in electrical engineering. Its kernel is 
K(x, y) = e~ xy . 

(c) The Euler transform has the kernel 


K(x, y) = (x- y) v . 



434 15. INTEGRALTRANSFORMS AND DIFFERENTIAL EQUATIONS 


Strategy for solving 
DEs using integral 
transforms 


(d) The Mellin transform has the kernel 
K(x,y) = G( X y), 

where G is an arbitrary function. Most of the time K(x, y) is taken to be simply x y . 

(e) The Hankel transform has the kernel 

K(x,y) = yJ n (xy) i 

where J n is the wth-order Bessel function. 

(f) A transform that is useful in connection with the Bessel equation has the kernel 

邮， 30 =(蓋)》 m 

The idea behind using integral transform is to write the solution u(z) of a DE 
in z in terms of an integral such as Equation (15.1) and choose v and the kernel 
in such a way as to render the DE more manageable. Let L z be a differential 
operator (DO) in the variable z. We want to determine u{z) such that L z [u] = 0, or 
equivalently, such that f c L z [K(z, dt — 0. Suppose that we can find M r , a 
DO in the variable t, such that L z [K(z, 01 = 01- Then the DE becomes 

J c (MtlK(z, dt = 0. If C has a and b as initial and final points (a and b 

maybe equal), then the Lagrange identity [see Equation (13.24)] yields 

rb 

0 = L z [u] = / K(z 9 t)^ t [v(t)]dt + Q[K, v]\ b a9 
Ja 

where Q[K, i;] is the “surface term.” If v(t) and the contour C (or a and b) are 
chosen in such a way that 

Q[K. vta = 0 and mJ[i;(03=0, (15.2) 

the problem is solved. The trick is to find an such that Equation (15.2) is easier to 
solve than the original equation, L z [u] = 0. This in turn demands a clever choice of 
the kernel, K(z, 0* This chapter discusses howto solve some common differential 
equations of mathematical physics using the general idea presented above. 


15.1 Integral Representation of the Hypergeometric 
Function 

Recall that for the hypergeometric function, the differential operator is 

d? d 

L z = z (l-z)-^ + [y - (a-\-p + l)z]— -otp 

For such operators — whose coefficient functions are polynomials ― the proper 
choice for K{z, t) is the Euler kernel, (z — t) s . Applying L z to this kernel and 




15.1 INTEGRAL REPRESENTATION OF THE HYPERGEOMETRIC FUNCTION 435 


rearranging terms, we obtain 

L z [K(z, t)] = {z 2 [—冲 一 1) 一 方(《 + 芦 + 1) - + z[s(s -l)-^sy 

-\-st(a + + 1) + 2afit] — yst — apt 2 }(z — t) s ~ 2 , 

(15.3) 

Note that except for a multiplicative constant, K(z, t) is symmetric in z and^. 
This suggests that the general form of may be chosen to be the same as that of 
L z except for the interchange of z and t. If we can manipulate the parameters in 
such a way that M f becomes simple, then we have a chance of solving the problem. 
For instance, if has the form of L z with the constant term absent, then the 
hypergeometric DE effectively reduces to a FODE (in dv/dt). Let us exploit this 
possibility. 

The general form of the that we are interested in is 

d 2 


去， 


i.e.，with no po term. By applying to K(z, t) = (z — t) s and setting the result 
equal to the RHS of Equation (15.3), we obtain 

s(s - \)p 2 - p\sz 4 - p\st = z 2 [— 冲 —1) — + 1) — ap] 

4 - z[s(s - 1) + jy + st(a + 0 + 1) + 2 乖 ] 

— yst _ 广， 


for which the coefficients of equal powers of z on both sides must be equal: 

— s(s — 1) — 5(q? + 1) —ap = 0 s = —a or s = — 

— p\s = — 1) + $)/ + 打 (a + 夕 + 1) + 

s{s - l)p 2 + pist = —yst — apt 2 . 

If we choose s = —a (s = leads to an equivalent representation), the coeffi¬ 
cient functions of will be completely determined. In fact, the second equation 
gives and the third determines P2(0- We finally obtain 

P\(t) -Qf + l-y+/(^-a-l), p 2 (t) = t -t 2 , 

and 

= r 2 )~ + [a + 1 _ y +_ a _ 1)]^, (15.4) 

which, according to Equation (13.20), yields the following DE for the adjoint: 

mJ[u] = ^{{t-t 2 )v\- ^{[a - y + l+t{p -a - l)]v] = 0. 


(15.5) 




436 15. INTEGRAL TRANSFORMS AND DIFFERENTIAL EQUATIONS 


The solution to this equation is u(0 = Ct a ~ y (t ― l) y- ^ _1 (see Problem 15.5). 
We also need the surface term, Q[K, u], in the Lagrange identity (see Problem 
15.6 for details): Q[K, v)(t) = Cat a - y ^ l (t - 

Finally, we need a specification of the contour. For different contours we will 
get different solutions. The contour chosen must, of course, have the property that 
Q[K, v] vanishes as a result of the integration. There are two possibilities: Either 
the contour is closed [a = b m (15.2)] or a 於 b but Q[K, v] takes on the same 
value at a and at b. 

Let us consider the second of these possibilities. Clearly, Q[K, vanishes 
att = l if Re(y) > Re(^). Also, as ^ oo, 

Q[K, D](f) — 

which vanishes if Re(j0) > 0. We thus take a = l and b = oo, and assume that 
Re(y) > Re(^) > 0. It then follows that 

pb poo 

u(z)= K(z ， t)v{t)dt = C’ / (t- z)~ a t a - y {t - l) v ~^ l dt. 

Ja (15.6) 

The constant C f can be determined to be r(y)/[r ( 卢 ) r(y — 卢 )] (see Problem 
15.7). Therefore, 

u{z)^F{ a> ^y,z)= r 二)—灼 l 、-m \y- p ~ l dt. 

Euler formula for the 
hypergeometric 
function 


Note that the term (1 — tz)~ a in the integral has two branch points in the z- 
plane, one atz = \ jt and the other at^ = oo. Therefore, we cut the z-plane from 
z\ = 1//, a point on the positive real axis, to zi = oo. Since 0 < ^ < 1, zi is 
somewhere in the interval [1 ， oo). To ensure that the cutis applicable for all values 
of t, we take z\ = I and cut the plane along the positive real axis. It follows that 
Equation (15.7) is well behaved as long as 

0 < arg(l — z) < 2 jt. (15.8) 

We could choose a different contour, which, in general, would lead to a different 
solution. The following example illustrates one such choice. 

15.1.1* Example. First note that Q[K, u] vanishes aU = 0 and r = 1 as long as Re(y) > 
Re ( 卢 ） and Re(or) > Re(y) — 1. Hence, we can choose the contour to start at ( = 0 and end 


It is customary to change the variable of integration from t to 1/t. The resulting 
expression is called the Euler formula for the hypergeometric function: 


f(a, 卢 ; y;z) 


r(y) 


r(P)r(y - fi) Jo 




(15.7) 





15.2 INTEGRAL REPRESENTATION OFTHE CONFLUENT HYPERGEOMETRIC FUNCTION 437 


at f = 1. We then have 


w(z) = C ff f (z- t)- a t a - y (l - 
JO 

=C ,f z~ a jf (l - %"(1- 旷芦 - 1 办 . 


(15,9) 


To see the relation between w{z) and the hypergeometric function, expand (1 — t/z)~ a in 
the integral to get 

oo 

w(z) = C ff z~ a ^ 

rt =0 


r (a + «) 

f(a)T(n + 1) Vz 



t a^n-y (1 _ t) y-^-l dL 


(15.10) 


Now evaluate the integral by changing t to l/t and using Equations (11.19) and (11.17). 
This changes the integral to 



r a ~ n - l+ ^(t - = 


r(a + » + 1 - y)r(y — 0) 
r(a H-n 4 - 1 — fi) 


Substituting this in Equation (15.10), we obtain 


w(z)= 


C" _ . a r (…) r( g +n + l-y) 
r(a) P)z ^r( a + n + i-0)r(n + i) 



c ,f 


r(y - p)z~ a 


r(a)T(a + 1 — y) 

r(a + 1 - fi) 


F(a, of-j/ + l;«-/3 + l; 1/z), 


where we have used the hypergeometric series of Chapter 14. Choosing 


r(a + i- 々） 

一 r() / - fi)T(a + 1 - K) 

yields w(z) = z~ a F(a, a — y + l;or — 卢 + l; 1/z), which is one of the solutions of the 
hypergeometric DE [Equation (14.30)]. M 


15.2 Integral Representation of the Confluent Hy¬ 
pergeometric Function 

Having obtained the integral representation of the hypergeometric function, we 
can readily get the integral representation of the confluent hypergeometric func¬ 
tion by taking the proper limit. It was shown in Chapter 14 that 0(a, y; z)= 
lim 於 —oo F(a y y; z/P)- This suggests taking the limit of Equation (15.7). The 
presence of the gamma functions with as their arguments complicates things, 
but on the other hand, the symmetry of the hypergeometric function can be utilized 



438 15, INTEGRAL TRANSFORMS AND DIFFERENTIAL EQUATIONS 


to our advantage. Thus, we may write 
7 ； z) 






r(K) 


* liin 

r(a)r(y - 0 () Jo 

^ 、 r(y) 

y ； z) 


1 


tz \ 一爲 

7) 


t a- 


(i 一 ty- a - l dt 


r ⑻ r(y -a) Jq 


e zt t ot-\ i ^_ t y- cl -\ dt 


(15.11) 


because the limit of the first term in the integral is simply e tz . Note that the condition 
Re(y) > Re(a) > 0 must still hold here. 

Integral transforms are particularly useful in determining the asymptotic be¬ 
havior of functions. We shall use them in deriving asymptotic formulas for Bessel 
functions later on, and Problem 15.10 derives the asymptotic formula for the con¬ 
fluent hypergeometric function. 


15.3 Integral Representation of Bessel Functions 


Choosing the kernel, the contour，and the function v(0 that lead to an integral 
representation of a function is an art, and the nineteenth century produced many 
masters of it. A particularly popular theme in such endeavors was the Bessel 
equation and Bessel functions. This section considers the integral representations 
of Bessel functions. 

The most effective kernel for the Bessel DE is 


K(z 9 t)= 


(f) V 


exp 




When the Bessel DO L ； 


dz 2 


1 d ( 

H -:— h ( 1 - ) acts on K{z, /), it yields 

zdz \ ) 


2 


\- z K(z, t) 




)( 心 


-z^/4t _ i 

\dt 


/ d v + 1 


)K( Z> ty 


Thus, Wlf = d/dt 一 （v + Y)ft, and Equation (13.20) gives 


at t 

whose solution，including the arbitrary constant of integration k, is v(t) = kt— v -K 
When we substitute this solution and the kernel in the surface term of the Lagrange 
identity, Equation (13.24)，we obtain 

Q[K, v](t) = Pl K( Zi t)v(t) = k (|)% 一卜 V 一心 ⑼ • 








15.3 INTEGRAL REPRESENTATION OF BESSEL FUNCTIONS 439 


Imr 

0 


Figure 15.1 The contour C in the 卜 plane used in evaluating J v (z). 


A contour in the f-plane that ensures the vanishing of Q[K, v]for all values of v 
starts at ^ = _oo, comes to the origin, orbits it on an arbitrary circle, and finally 
goes back to ^ = —oo (see Figure 15.1). Such a contour is possible because of the 
factor e ( in the expression for Q[K, i?]. We thus can write 


Mz)^k f c r v - x e x - z ^ m dt 


(15.12) 


integral 
representation of 
Bessel function 


Note that the integrand has a cut along the negative real axis due to the factor 
t~ v ~ l . If v is an integer, the cut shrinks to a pole at 才 = 0. 

The constant k must be determined in such a way that the above expression 
for J v (z) agrees with the series representation obtained in Chapter 14. It can be 
shown (see Problem 15.11) that k = l/(27ti). Thus, we have 


It is more convenient to take the factor (z/2) v into the integral, introduce a new 
integration variable u = 2tjz, and rewrite the preceding equation as 




(15.13) 


This result is valid as long as Re(zw) < 0 when u —oo on the negative real 
axis; that is, Re(z) must be positive for Equation (15.13) to work. 

An interesting result can be obtained from Equation (15.13) when v is an 
integer. In that case the only singularity will be at the origin, so the contour can be 
taken to be a circle about the origin. This yields 


Jn(z) 


2ni 


u 


—n—\ 


e {zj2){u-\lu) du ^ 




Bessel generating 
function 


which is the «th coefficient of the Laurent series expansion of exp [( 之 /2)(m — 1 /u)] 
about the origin. We thus have this important result: 


00 




(15.14) 




INTEGRAL TRANSFORMS AND DIFFERENTIAL EQUATIONS 


The function exp[(z/2)(r — I/O] is therefore appropriately called the generating 
function for Bessel functions of integer order (see also Problem 14.40). Equation 
(15.14) can be useful in deriving relations for such Bessel functions as the following 
example shows. 

15*3*1. Example. Let us rewrite the LHS of (15.14) as , expand the expo¬ 

nentials, and collect terms to obtain 





oo oo 


-EE 

m=0/j=0 


(~l) n 

mini 


(T ntm ~ n 


If we let /n — n = k, change the m summation to k 9 and note that k goes from —oo to oo, 
we get 


e (z/2Kt-l/t ) = 


oo oo 

E I ： 


k=—oon=0 


(-D n 

(n + 众 )! n! 


/ Z \2n-\-k 
\2 / 



^ (-l) w (Z^n 



Comparing this equation with Equation (15.14) yields the familiar expansion for the Bessel 
function: 


Jk(z)= 


(一 l) n 

V2 / ^r («+/： + i)r(w + i) 



We can also obtain a recurrence relation for J n (z). Differentiating both sides of Equation 
(15.14) with respect to t yields 


z 

2 


1 + 1)1") = 


£ nJn(z)t n - 1 . 

n=—OO 


(15.15) 


Using Equation (15.14) on the LHS gives 

(I + 点 ) J n(Z)t n = ^ Jn(z)t n + I Jn(z)t n ~ 2 

«=—oo 丄 A n=—oo ^ n=—oo 

=吾 £ J n -i(z)t n - 1 + ^ f ； 

z «==-oo Z n=-00 ( 15 . 16 ) 


where we substituted — 1 for n in the first sum and « +1 for w in the second. Equating the 
coefficients of equal powers of t on the LHS and the RHS of Equations (15.15) and (15.16), 
we get 

nJ n (z) = + (Z)], 

which was obtained by a different method in Chapter 14 [see Eq. (14.48)]. M 









15.3 INTEGRAL REPRESENTATION OF BESSEL FUNCTIONS 441 


C 3 




Imw 





4 


Ci 


Re w 


Figure 15.2 The contour C f in the it;-plane used in evaluating J v (z). 


We can start with Equation (15.13) and obtain other integral representations 
of Bessel functions by making appropriate substitutions. For instance, we can let 
u = e w and assume that the circle of the contour C has unit radius. The contour 
C f in the w;-plane is determined as follows. Write u = re l9 and w 三 ： r + iy f so 1 
re l ° — e x e iy yielding r = e x ande l& = e iy . Along the first part of C,G = —it and 
r goes from ooto 1. Thus, along the corresponding part of C\ y = —it and x goes 
from oo to 0. On the circular part of C, r = 1 and 6 goes from —7T to +jt. Thus, 
along the corresponding part of C' jc = 0 and )； goes from —jt to +tt. Finally, on 
the last part of C\y = 7t and x goes from 0 to oo. Therefore, the contour C in 
the w-plane is as shown in Figure 15.2. 

Substituting u =e w m. Equation (15.13) yields 

J v (z) = ^ - ： [ e z ^ hw ~ vw dw, Re(z) > 0, (15.17) 

2m Jc 

which can be transformed into (see Problem 15.12) 


integral 
representation of 
Bessel functions of 
integer order 


_7T 


l OO 


J V {Z) = — I COS(v0 — Z^UlO)d6 — Sm V7T f 
Jo n 


r 0 


For the special case of integer v, we obtain 




Mz) 

In particular, 

■A) ⑻ 1 


兀 Jo 


cos(nG — z sin 6) dO. 


•JT 


^ Jo 


cos(z sin0) dO. 


(15.18) 


We can use the integral representation for J v {z) to find the integral repre¬ 
sentation for Bessel functions of other kinds. For instance, to obtain the integral 


not confuse x and y with the real and imaginary parts of z. 




442 15. INTEGRAL TRANSFORMS AND DIFFERENTIAL EQUATIONS 


in 


Im w 




A 


C 




Re w 


Figure 15.3 The contour C f, in the hi- plane used in evaluating 


representation for the Neumann function Y v (z), we use Equation (14.44): 


Y v (z) = (cot V7t)J v (z ) —— r 


sin vn 


viz) 


cot vn 


*7T 


Jo 


cos(vO — z sin 0) d6 


COS V7t 


►oo 


e 


—vt—z sinh( 


dt 


n sin viz Jq 


Jo 

cos(v9 -f z sin0) d6 - f e vt ~ z &inht dt 

兀 Jo 


with Re(z) > 0. Substitute 7r — 0 for 0 in the third integral on the RHS. Then 
insert the resulting integrals plus Equation (15.18) in H^\z) = J v (z)-\-iY v (z) to 
obtain 


►7T 


_00 


汴 J 0 vt ' 


e ,a zsinhf d/ 


e 


■IVJT /*00 


in 


e ^zsmht dti Re ⑴ > 0, 


These integrals can easily be shown to result from integrating along the contour 
C /f of Figure 15.3. Thus, we have 


r^(z) = — / e ZBmw ~ vw dw Re ⑵ > 0_ 

k Jc n 


By changing i to —i, we can show that 


r v ( 2 ) (z) = / e z ^ w ~ vw dw, Re(z) > 0, 

ITT Jc ttr 


where C m is the mirror image of C" about the real axis. 



15.4 ASYMPTOTIC BEHAVIOR OF BESSEL FUNCTIONS 


15.4 Asymptotic Behavior of Bessel Functions 


As mentioned before, integral representations are particularly useful for determin¬ 
ing the asymptotic behavior of functions. For Bessel functions we can consider two 
kinds of limits. Assuming that both v and z = a: are real, we can consider v — oo 
otx oo. First, let us consider the behavior of (jc) of large order. The appropri¬ 
ate method for calculating the asymptotic form is the method of steepest descent 
discussed in Chapter 11 for which v takes the place of the large parameter a. We 
use Equation (15.17) because its integrand is simpler than that of Equation (15.13). 
The form of the integrand in Equation (15.17) may want to suggest f(w) = —w 
and g(w) = e xsmhw . However, this choice does not allow setting f f (w) equal 

( X \ 

— sinhu; — w; J, and 

conveniently introduce x/v = 1/coshu ； o, with wo a real number, which we take 
to be positive. Substituting this in the equation above, we can read off 


/ ⑽ 


sinhu; 
cosh it?o 


-w ， 


茗 ⑽ = 1 _ 


The saddle point is obtained from df/dw = 0 or coshu; = cosh wo. Thus, 
w = ±wo + 2injr, for « = 0,1, 2_Since the contour C lies in the right half¬ 

plane, we choose wq as the saddle point. The second derivative /"(u;。）is simply 
tanh wo, which is real, making 0 2 = 0, and 0\ = jt/ 2 or 37 t/ 2. The convention 
of Chapter 11 suggests taking 外 = tt/2 (see Figure 15.4). The rest is a matter of 
substitution. We are interested in the approximation to w up to the third order inf: 
u; — w；o — + bit 1 + b^t 3 . Using Equations (11.31), (11.37), and (11.38), we 

can easily find the three coefficients: 

1 = I/" ㈣ 

, / w Oo) 鐵 cosh 2 wo 

b2 ^W(^ e 

b _ 5[f f \wo)] 2 / ⑷ V2e^ 

3 ~ 3[/^(u ； o)] 2 ~ n\r(w 0 )\^ 2 

= ~ i m^^(l coth2wo ~ 1 )- 


If we substitute the above in Equation (11.36), we obtain the following asymp¬ 
totic formula valid for v ^ oo: 

e x(sinliw; 0 -uj 0 coshu ； o ) 「 1/5 \ 1 

(27rxsinhu ； o) 1 /2 L 1 + 8xsinhu；o 0 ~ 3 
where v is related to wq via v = x cosh wq. 


444 15. INTEGRAL TRANSFORMS AND DIFFERENTIAL EQUATIONS 



Figure 15.4 The contour Co in the w-plane used in evaluating J v (z) for large values of 
v. 


Let us now consider the asymptotic behavior for large x. It is convenient to 
consider the Hankel functions H^\x) and Hy 2 \x), The contours C" and C"’ 
involve both the positive and the negative real axis; therefore, it is convenient, 
assuming that x > v, to write v = xcosp so that 

= i f e ^^^sp) dw 

in Jc 1 

The saddle points are given by the solutions to cosh w — cos 多 ， which are wo = 
±ip. Choosing wo = +0， we note that the contour along which 

Im(sinh w — w cos 戸 ) =Im(sinh wo — cos p) 


is given by coshw = [sin 戸 + (v - 0)cos 卢 ] /sinu. This contour is shown in 
Figure 15.5. The rest of the procedure is exactly the same as for J v (x) described 
above. In fact, to obtain the expansion for ⑴ “），we simply replace wq by ip. 
The result is 


2 . ) V2 g 

\i7tx smp/ 


i(x sm^—vp) 


1 + 


8ix sin 月 


1 + 器 cotV 


]■ 


When x is much larger than v, ^ will be close to jt/ 2, and we have 





e i(X-V7l/2-7T/4) 



which, with 1 /jc 0, is what we obtained in Example 11.5.2. 




15.5 PROBLEMS 445 



Figure 15.5 The contour in the u;-plane used in evaluating in the limit of large 

values of x. 


The other saddle point, at -屮， gives the other Hankel function, with the asymp¬ 
totic limit 



物)七-― 一‘) 


We can now use the expressions for the asymptotic forms of the two Hankel 
functions to write the asymptotic forms of J v (x) and Y v (x) for larger: 











'. / 7C 7T\ 1 / 7Z 7t\ 

■叶 一 一 H 叶— — + 


15.5 Problems 

15.1. Use the change of variables k =^]nt and ix = co — cx (where k and jc are the 
common variables used in Fourier transform equations) to show that the Fourier 
transform changes into a Mellin transform, 


G{t) = 



where F(co) 





15.2. The Laplace transform L[f\ of a function f{t) is defined as 


L[f](s )= 



e— st fW}dt. 




446 15. IMTEGRALTRANSFORMS AND DIFFERENTIAL EQUATIONS 


Show that the Laplace transform of 


⑻ 

m 

=1 . 

is 

⑻ 

no 

二 cosh cot 

is 

(c) 

no 

=smh cot 

is 

(d) 

fit) 

— cos cot 

is 


(e) f(t) — sineot 


(/) f(t) = e wl for ^ > 0 


⑻ fit) = t n 


15.3. Evaluate the integral 
C°° sin cot 


m 


dco 


『0 出 


is 

is 

is 


s 

s 

s 2 — O) 2 ' 
CO 

s 2 — a) 2 ' 
s 

P + 忑 . 

CO 

S 1 

1 

5 — 0 ), 

T(n + 1 ) 

5 n+1 


where ,s > 0. 
where > co 2 . 
where s 2 > (o 2 . 


where s > co t 
where .y > 0, n > —1. 


by finding the Laplace transform and changing the order of integration. Express 
the result for both ^ > 0 and r < 0 in terms of the theta function. (You will need 
some results from Problem 15.2.) 


15 A Show that the Laplace transform of the derivative of a function is given by 
L[F f ](s) — sL[F](s) — F(0). Similarly, show that for the second derivative the 
transform is 


L[F ff ](s) = s 2 L[F](s) - sF(0 )- ，⑼ _ 

Use these results to solve the differential equation u ff (t) + co 2 u{t) = 0 subject to 
the boundary conditions w(0) = a, u^O) = 0. 

15.5. Solve the DE of Equation (15.5). 

15.6. Calculate the surface term for the hypergeometric DE. 

15.7. Determine the constant C f in Equation (15.6)，the solution to the hypergeo¬ 
metric DE. Hint: Expand (t - z)— a inside the integral, use Equations (11.19) and 
(11.17), and compare the ensuing series with the hypergeometric series of Chapter 
14. 


15,8. Derive the Euler formula [Equation (15.7)]. 


15.9. Show that 
F(a^;y;l) = 


rp/)r(y - a - 灼 
r() /一 a)T(y - P) 


(15.19) 









15.5 PROBLEMS 447 


Hint: Use Equation (11.19). Equation (15.19) was obtained by Gauss using only 
hypergeometric series. 

15.10. We determine the asymptotic behavior of 4>(a, y; z) for z ^ oo in this 
problem. Break up the integral in Equation (15.11) into two parts, one from 0 to 
—oo and the other from —oo to 1. Substitute —t/z for / in the first integral, and 
1 一 t/z for t in the second. Assuming that z ^ oo along the positive real axis, 
show that the second integral will dominate, and that 




as z ^ oo. 


15.11. In this problem, we determine the constant k of Equation (15.12). 

⑻ Write the contour integral of Equation (15.12) for each of the three pieces of 
the contour. Note that arg(/) = —7t as? comes from —oo and arg(r) = 7 r as r goes 
to —oo. Obtain a real integral from 0 to oo. 

(b) Use the relation r(z)r(l - z) — itj sin^^, obtained in Chapter 11， to show 
that 


r(-z) 


T(z + 1 ) simrz 


(c) Expand the function exp(^ 2 /4^) in the integral of part (a), and show that the 
contour integral reduces to 


2i sin V7T m 


/Z\ 2n r(— w — v) 


r(n + 1 ) 


(d) Use the result of part (c) in part (b), and compare the result with the series 
expansion of J v (z) in Chapter 14 to arrive finally at A： = l/(2ni). 

15.12. By integrating along Ci, C 2 , C 3 , and C 4 of Figure 15.2, derive Equation 
(15.18). 

15.13* By substituting t = exp(i^) in Equation (15.14), show that 




Mz) H- 2 y^ J 2n (z) cos(2n0) + 2i h n +\{z) sin[( 2 n + 1)0]. 


In particular, show that 


J 0 ( Z ) = — / e iz ^ 6 d0. 

Jo 

15.14. Derive the integral representations of // v (1 ) ⑷ and Hv 2 \x) given in Section 
15.3. 


448 15. INTEGRAL TRANSFORMS AND DIFFERENTIAL EQUATIONS_ 

Additional Reading 

1. Dennery, P. and Krzywicki, A. Mathematics for Physicists, Harper and Row, 
1967. 

2. Mathews, J. and Walker, R. Mathematical Methods of Physics, 2nd ed” 
Benjamin, 1970. 











An Introduction to Operator Theory 


The first two parts of the book dealt almost exclusively with algebraic techniques. 
The third and fourth part were devoted to analytic methods. In this introductory 
chapter, we shall try to unite these two branches of mathematics to gain insight 
into the nature of some of the important equations in physics and their solutions. 
Let us start with a familiar problem. 

16.1 From Abstract to Integral and Differential Op¬ 
erators 

Let’s say we want to solve an abstract vector-operator equation A|m) = |v) inaniV- 
dimensional vector space V. To this end, we select a basis B — write the 

equation in matrix form, and solve the resulting system of linear equations. This 
produces the components of the solution \u) in B. If components in another basis 
B f arc desired, they can be obtained using the similarity transformation connecting 
the two bases (see Chapter 3). 

There is a standard formal procedure for obtaining the matrix equation. It 
is convenient to choose an orthonormal basis B = {[^)}^ for V and refer 
all components to this basis. The procedure involves contracting both sides of 
the equation with (ei\ and inserting 1 = Ylj=\ \ e j) (^j\ between A and \u): 

Z!f=l A |ey) {ej\u) = {et\v) fori = 1, 2,..., iV, or 
N 

^ A”uj = Vi for « = 1,2 ,iV, 


( 16 - 1 ) 



452 16. AN INTRODUCTION TO OPERATOR THEORY 


where Ay ^ {ei \ A \ej ), uj = (ej\u), and u/ = {e；| u). Equation (16.1) is a 
system of N linear equations in N unknowns {uj} N ^ =v which can be solved to 
obtain the solution(s) of the original equation in B. 

A convenient basis is that in which A is represented by a diagonal matrix 
diag (入 i, ^ 2 ,Then the operator equation takes the simple form 入 i m/ = 灼， 
and the solution becomes immediate. 

Let us now apply the procedure just described to infinite-dimensional vector 
spaces, in particular, for the case of a continuous index. We want to find the 
solutions of K \u) = \ f). Following the procedure used above, we obtain 

\y) w(y) {y| ^ \u) ^ J W K|y> w(y)(y\u) dy = (x\f), 

y. _ j 


where we have used the results obtained in Chapter 6 . Writing this in functional 
notation, we have 



K(x,y)w(y)u(y) dy = f(x) 9 


(16.2) 


Integral operators which is the continuous analogue of Equation (16.1). Here (a, b) is the interval 

and kernels 0 n which the functions are defined. We note that the indices have turned into 
continuous arguments, and the sum has turned into an integral. The operator K that 
leads to an equation such as (16.2) is called an integral operator (10), and the 
“matrix element” K(x, y) is said to be its kernel. 

The discussion of the discrete case mentioned the possibility of the operator A 
being diagonal in the given basis B, Let us do the same with (16.2); that is, noting 
that x and y are indices for K, let us assume that K(x, y) = 0 for a: ^ y. Such 
local operators operators are called local operators. For local operators, the contribution to the 
integral comes only at the point where x y (hence，their name). It K(x, y) is 
finite at this point, and the functions w(y) and u(y) are well behaved there, the 
LHS of (16.2) will vanish, and we will get inconsistencies. To avoid this, we need 
to have 


你， _ oo 


if 

if x ^ y. 


Thus, K[x, y) has the behavior of a delta function. Letting K(x, y) = L(x)8(x - 
y)/w(x) and substituting in Equation (16.2) yields L(x)u(x) = f(x). 

In the discrete case, Xi was merely an indexed number; its continuous analogue, 
L(x), may represent merely a function. However, the fact that x is a continuous 
variable (index) gives rise to other possibilities for L(x) that do not exist for the 
discrete case. For instance, L(x) could be a differential operator. The derivative, 
although defined by a limiting process involving neighboring points, is a local 
operator. Thus, we can speak of the derivative of a function at a point. For the 



16.2 BOUNDED OPERATORS IN HILBERT SPACES 453 


right-shift 叩 erator 


discrete case, ut can only “hop” from / to / +1 and then back to / • Such a difference 
(as opposed to differential) process is not local; it involves not only j but also j + 1. 
The “point’、’ does not have an (infinitesimally close) neighbor. 

This essential difference between discrete and continuous operators makes the 
latter far richer in possibilities for applications. In particular, if L{x) is considered 
a differential operator, the equation L{x)u(x) = f(x) leads directly to the fruitful 
area of differential equation theory. 

16.2 Bounded Operators in Hilbert Spaces 

The concept of an operator on a Hilbert space is extremely subtle. Even the elemen¬ 
tary characteristics of operators, such as the operation of hermitian conjugation, 
cannot generally be defined on the whole Hilbert space. 

In finite-dimensional vector spaces there is a one-to-one correspondence be¬ 
tween operators and matrices. So, in some sense, the study of operators reduces 
to a study of matrices，which are collections of real or complex numbers. Al¬ 
though we have already noted an analogy between matrices and kernels, a whole 
new realm of questions arises when A(j is replaced by 足 (x ， >0 ― questions about 
the continuity of K(x, y) in both its arguments, about the limit of K(x,y) as x 
and/or y approach the “end points” of the interval on which K is defined, about 
the boundedness and “compactness” of K ， and so on. Such subtleties are not unex¬ 
pected. After all, when we tried to generalize concepts of finite-dimensional vector 
spaces to infinite dimensions in Chapter 5, we encountered difficulties. There we 
were concerned about vectors only; the generalization of operators is even more 
complicated. 

16.2.1. Example. Recall that €°° is the set of sequences \a) = orof oo-tuples 

■ ■ ■ )，that satisfy the convergence requirement Y^jL\ \ a j\ 2 < (see Example 
1.1.2). It is a Hilbert space with inner product defined by (a\ b) = The 

standard (orthonormal) basis for C°° is (le/)}^, where | 句 》 has all components equal to 
zero except the ith one, which is 1. Then one has \a) — aj \ej). 

One can introduce an operator TV ， called the right-shift operator, by 

T r \a) — Tr 

In other words ， T> transforms ， a〗》.. •) to (0, cq, 0 : 2 , .. ■ )■ It is straightforward to show 
that T r is indeed a linear operator. B 

The first step in our study of vector spaces of infinite dimensions was getting 
a handle on the convergence of infinite sums. This entailed defining a norm for 
vectors and a distance between them. In addition, we noted that the set of linear 
transformations L(V, W) was a vector space in its own right. Since operators are 
“vectors” in this space, the study of operators requires constructing a norm in 
-C(V, W) when V and W are infinite-dimensional. 



454 16. AM INTRODUCTION TO OPERATOR THEORY 


16.2.2. Definition. Let and %2 be two Hilbert spaces with norms || - ||i and 
|| • \\ 2 .ForanyJ ^ £；(%, %), the number 

\ IItx|| 2 . v , n i 

max l^yT 

operator norm (if it exists) is called^ the operator norm ofT and is denoted by [|T|[. A linear 
transformation whose norm is finite is called a bounded linear transformation. A 
bounded linear transformation from a Hilbert space to itself is called a bounded 
bounded operator operator. The collection of all bounded linear transformations，which is a subset 

K 2 ), will be denoted by % 2 ), and if%\ = 冗 2 三沉 , it will be 

denoted by 

Note that || • || i and || - || 2 are the norms induced by the inner product of %\ 
and 3^2* Also note that by dividing by ||jc||i we eliminate the possibility of dilating 
the nonn of ||T|| by choosing a “long” vector. By restricting the length of |x), 
one can eliminate the necessity for dividing by the length. In fact, the norm can 
equivalently be defined as 

||T|| ^max{||T^|| 2 | ||x||i = 1} = max {||Tjc|| 2 | ll^lh < 1). (16.3) 

It is straightforward to show that the three definitions are equivalent and they indeed 
define a norm. 

16.2.3. Proposition. An operator T is bounded if and only if it maps vectors of 
finite norm to vectors of finite norm. 

Proof. Clearly, if T is bounded, then ||Tx|| has finite norm. Conversely, if ||Tx ||2 
is finite for all |x) (of unit length), max{||T ^||2 | \\x\\i = 1} is also finite, and T is 
bounded. ^ 

An immediate consequence of the definition is 

||Tx ||2 < IITII ll^lli V|X>€%. (16.4) 

If we choose \x) — \y) instead of > > ， it will follow from (16.4) that as \x) approaches 
|j),T|a;) approaches T \y). This is the property that characterizes continuous func¬ 
tions: 

bounded 叩 erators 16«2.4. Proposition. The bounded operator 1 e 冗 2 ) & a continuous func- 

are continuous tionfrom to ^ 2 - 

Another consequence of the definition is that 

lr The. precise definition uses “supremum” instead of “maximum.，，Rather than spending a lot of effort explaining the difference 
between the two concepts, we use the less precise, but more intuitively familiar, concept of “maximum.” 



16.2 BOUNDED OPERATORS IN HILBERT SPACES 455 


16.2.5. Box. ® (JCi ， 況2 ) k a vector subspace 況 2 )， and for = 

!K 2 = we have 1 e *B(M) and ||11| = 1. 


derivative operator is 
unbounded 


norm of a product is 
less than the product 
of norms. 


16.2.6. Example. We have seen that in an inner product space, one can associate a linear 
operator (linear functional) to every vector. Thus, associated with the vector |a:) in a Hilbert 
space % is the linear operator fjt : !H —^ C defined byfjc(|^)) = (x\ y). We want to compare 
the operator norm of f^ with the normof |jc). First note that by using the Schwarz inequality, 
we get 


max 


MM 

w 


\y) # O] = max I ~ ^ ^ \y) ^ 0 j < \\x\\. 


W 


On the other hand, from ||^:|| 2 = f^d^)), we obtain 


fx(M) 


< max 


1 ( 130)1 


… W - I w 

These two inequalities imply that \\1 X II = IWI. 


\y)^o]=]\f x \\. 


s 


16.2.7. Example. The derivative operator D = d/dx is not a bounded operator on 
the Hilbert space 2 £ 2 (a, b) of square-integrable functions. With a function like f(x)= 
y/x — c, one gets 


、b 


n/ir 


(x — a) dx 


ib ~ a)2 ^ 11/11 = 


while df/dx = 1/(2V^ — o) gives ||D/|| 2 = | dx/(x — a) == oo. We conclude that 
II oil = oo. m 


16.2.8* Example. Since £(JQ is an algebra as well as a vector space, one maybe inter¬ 
ested in the relation between the product of operators and their norms. More specifically, 
one may want to know how ||ST|| is related to ||S|[ and ||T||. In this example we show that 


l|ST|| < ||S|| ||T||. 

To do so, we use the definition of operator norm for the product ST: 
IIST^Ii 


(16.5) 


[|ST|| = max 


max 


< max 


lull 

IIST^H IIT^II 


l|Tx|| [|jc|| 
HS(Tk))|l 
网 


k> # 0] 

\x)^0^1\x) 


Tk> ^°) maX llSfl W ^°l' 


=!|T|| 


-Here the two Hilbert spaces coincide, so that the derivative operator acts on a single Hilbert space. 



456 16. AN INTRODUCTION TO OPERATOR THEORY 


Now note that the first term on the RHS does not scan all the vectors for maximality: It 
scans only the vectors in the image of T. If we include all vectors, we may obtain 狂 larger 
number. Therefore, 


l|S(T|x))H 

IITjcII 


T|x) ^Oj < max { \x) 7 ^ 0 } 


w 


lisil ， 


and the desired inequality is established. A useful consequence of this result is ||T n || < ||T|| H , 
which we shall use frequently. ■ 

We can put Equation (16.5) to immediate good use. 

16.2.9. Proposition, Let % be a Hilbert space and T e " ||T|! < 1 ， then 

^ —J is invertible and (1 — T) - 1 = ESo T ' 

Proof. First note that the series converges, because 

1 


OO 




OO 


OO 


<En T "ii^En T i |fl 

n=0 n=0 


1 - imi 

and the sum has a finite norm. Furthermore， 


OO / k \ K 


n=0 


k^-oo 


lim 

k-¥OQ 


(±r-±r^ 

\n=0 «=0 


lim (1-T* +1 ) = 1, 
k—00 


because 0 < lim^oo ||T^ +l || < lim^oo HT|| fe+1 = 0 for ||T|| < 1, and the 
vanishing of the norm implies the vanishing of the operator itself. One can similarly 
show that (XXo T n )(1 — T) = 1. 口 


A corollary of this proposition is that operators that are “close enough” to an 
invertible operator are invertible (see Problem 16.1). Another corollary, whose 
proof is left as a straightforward exercise, is the following: 

16.2.10. Corollary. Let T e !B(^K) andXa complex number such that ||T|| < | 入 |. 
Then T — A.1 is an invertible operator，and 



Adjoints play an important role in the study of operators. We recall that the 
adjoint of T is defined as T |x>* == <x| T 卞 |^) or (T^：| y) = {x |T^ j). In the finite- 
dimensional case, we could calculate the matrix representation of the adjoint in 
a particular basis using this definition and generalize to all bases by similarity 
transformations. That is why we never raised the question of the existence of the 
adjoint of an operator. In the infinite-dimensional case, one must prove such an 
existence. We state the following theorem without proof: 



16.3 SPECTRA OF LINEAR OPERATORS 457 


16.2.11. Theorem. LetT e Then the adjoint ofT, defined by 
(Tx\y) = {x |T f 3 ；), 

T and have equal exists. Furthermore, ||T|[ = ||T^||. 
norms 

Another useful theorem that we shall use later is the following, 

16.2.12. Theorem* Let 3nT(T) and 灭 (T) denote the null space (kernel) and the 
range ofT G ®(!K). We have 

NCT 1 *) = K(T) 丄 and W(T)= 叫丁 1 *) 丄 . 

Proof. |jc>isin iffTt \x) = 0 iff {j \T^x > = 0 for all \y) € This holds 
if and only if {T>| x) = 0 for all \y) e IK. This is equivalent to the statement that 
\x) is in 灭 ( 了 ) 丄 _ This chain of argument proves that >f(T^)= 灾 (1) 丄 _ The second 
part of the theorem follows from the fact that (T 卞 ) 卞 =T. □ 


regular point of an 
operator 

resolvent set and 
spectrum of an 
operator 


every eigenvalue of 
an operator on a 
vector space of finite 
dimension is in its 
spectrum and vice 
versa 


16o3 Spectra of Linear Operators 

One of the most important results of the theory of finite-dimensional vector 
spaces is the spectral decomposition theorem developed in Chapter 4. The infinite¬ 
dimensional analogue of that theorem is far more encompassing and difficult to 
prove. It is beyond the scope of this book to develop all the machinery needed for 
a thorough discussion of the infinite-dimensional spectral theory. Instead, we shall 
present the central results, and occasionally introduce the reader to the peripheral 
arguments when they seem to have their own merits. 

16.3.1. Definition. LetT e A complex number k is called a regular point 
ofT if the operator T — kl is bounded and invertible? The set of all regular points 
ofJis called the resolvent set ofT, and is denoted by p(J). The complement of 
p(T) in the complex plane is called the spectrum ofT and is denoted by a(T). 

Corollary 16.2.10 implies 4 that if T is bounded, thenp(T) is not empty, and that 
the spectrum of a bounded linear operator on a Hilbert space is a bounded set. In 
fact, an immediate consequence of the corollary is that A. < ||T|] for all A. € <r(T). 

It is instructive to contrast the finite-dimensional case against the implications 
of the above definition. Recall that because of the dimension theorem, a linear 
operator on a finite-dimensional vector space V is invertible if and only if it is 
either onto or one-to-one. Now , 入 e o*(T) if and only if T — 入 1 is not invertible. For 
finite dimensions, this implies that 5 ker(T — A.1) 一 0. Thus, in finite dimensions, 


^If T is bounded, then T - A.1 is automatically bounded. 

^One can simply choose a X whose absolute value is greater than ||T|J. 

5 Note how critical finite-dimensionality is for this implication. In infinite dimensions, an operator can be one-to-one (thus 
having a zero kernel) without being onto. 


458 16. AN INTRODUCTION TO OPERATOR THEORY 


not all points of a(T) 
are eigenvalues 


X e cr(T) if and only if there is a vector \a) in V such that (T — 入 1) |a> = 0. This 
is the combined definition of eigenvalue and eigenvector, and is the definition we 
will have to use to define eigenvalues in infinite dimensions. It follows that in the 
finite-dimensional case, <j(T) coincides with the set of all eigenvalues of T. This 
is not true for infinite dimensions, as the following example shows. 

16.3,2. Example* Consider the right-shift operator acting on C 00 . It is easy to see that 
\\T r a\\ = \\a\\ for all \a). This yields ||T r || = 1, so that any k that belongs to <j(T r ) must 
be such that \X\ < 1. We now show that the converse is also true, i.e” that if |1| < 1, then 
X € <r(T r ). It is sufficient to show that if 0 < |A.| < 1, then T r — XI is not invertible. To 
establish this, we shall show that T> — 11 is not onto. 

Suppose that T r — 入 1 is onto. Then there must be a vector \a) such that (T r —XI) \a )= 
\e\) where \e\) is the first standard basis vector of C 00 . Equating components on both sides 
yields the recursion relations = 一 l / 入 ， and cej^x = kaj for all j > 2. One can readily 
solve this recursion relation to obtain aj = —1/X J for all 7. This is a contradiction, because 

00 00 1 

^ aj{2 = T[W 

y=l ;=l 1 1 

will not converge ifO < \k\ < \a) ^ C 00 , and therefore T r - A.1 is not onto. 

We conclude that a(T r ) = [X e €|0 < |>-| < 1}. If we could generalize the result 
of the finite-dimensional case to €°°, we would conclude that all complex numbers whose 
magnitude is at most 1 are eigenvalues of T r . Quite to our surprise, the following argument 
shows that T r has no eigenvalues at all! 

Suppose that 入 is an eigenvalue of T r . Let \a) be any eigenvector for X. Since T r 
preserves the length of a vector, we have (a\a) = (T r a| T r a) = {"ka\Xa) = \k\ 2 {a\a). 
It follows that \X\ = 1. Now write |^z) = and let a m be the first nonzero term of 

this sequence. Then 0 = (T r a|e m ) = (^a\e m ) = ka m . The first equality comes about 
because T r \a) has its first nonzero term in the (tw + l)st position. Since 入一 ： 0, we must 
have ct m = 0, which contradicts the choice of this number. M 


i6A Compact Sets 

This section deals with some technical concepts, and as such will be rather formal. 
The central concept of this section is compactness. Although we shall be using 
compactness sparingly in the sequel, the notion has sufficient application in higher 
analysis and algebra that it warrants an introductory exposure. 

Let us start with the familiar case of the real line, and the intuitive notion 
of “compactness.” Clearly, we do not want to call the entire real line “compact,” 
because intuitively, it is not. The next candidate seems to be a “finite” interval. 
So, first consider the open interval (a, b). Can we call it compact? Intuition says 
“yes，” but the following argument shows that it would not be appropriate to call 
the open interval compact. 

Consider the map : E (a ， ft) given by 沪 (0 = tanh ?+. The reader 

may check that this map is continuous and bijective. Thus, we can continuously 






16.4 COMPACT SETS 459 


map all of R in a one-to-one maimer onto (a, b). This makes (a, b) “look” very 
much 6 like R. How can we modify the interval to make it compact? We do not 
want to alter its finiteness. So, the obvious thing to do is to add the end points. 
Thus, the interval [a, b\ seems to be a good candidate; and indeed it is. 

The next step is to generalize the notion of a closed, finite interval and even¬ 
tually come up with a definition that can be applied to all spaces. First we need 
some terminology. 

open ball 16.4.1. Definition. An open ball B r {x) of radius r and center lx) in a normed 
vector space V is the set of all vectors in V whose distance from \x、is strictly less 
than r: 

Br(x)^{\y)eV\\\y-x\\<r}. 

open round We call B r (jc) an open round neighborhood of\x). 
neighborhood _ .. 

This is a generalization of open interval because 

' r ^ I a + b b — a\ 

(a ， 6)=|yeR| y -— < ^― )• 

16.4«2« Example. A prototype of finite-dimensional normed spaces is R w . An open ball 
of radius r centered at x is 

Br(x) = {y e I (B - jq) 2 + 0? 2 - x 2 ) 2 + … + - x n ) 2 < r 2 }. 

Thus, all points inside a circle form an open ball in the ^j-plane, and all interior points of 
a solid sphere form an open ball in space. M 

bounded subset 16.4.3* Definition. A bounded subset of a normed vector space is a subset that 
can be enclosed in an open ball of finite radius. 

For example，any region drawn on a piece of paper is a bounded subset of M 2 , 
and any “visible” part of our environment is a bounded subset of R 3 because we 
can always find a big enough circle or sphere to enclose these subsets. 

open subset 16.4.4. Definition* A subset 0 of a normed vector space V is called open if each of 
its points (vectors) has an open round neighborhood lying entirely inO.A boundary 
boundary point point of 0 is a point (vector) in V all of whose open round neighborhoods contain 
closed subset and points inside and outside 0. A closed subset G of V is a subset that contains all 
closure of its boundary points. The closure of a subset S is the union of S and all of its 
boundary points, and is denoted by S. 

For example, the boundary of a region drawn on paper consists of all its bound¬ 
ary points. A curve drawn on paper has nothing but boundary points. Every point 
is also its own boundary. A boundary is always a closed set. In particular，a point 
is a closed set. In general, an open set cannot contain any boundary points. A 
frequently used property of a closed set 6 is that a convergent sequence of points 
of 6 converges to a point in G. 

6 In mathematical jargon one says that (a, b) and R are homeomorphic. 


460 16. AN INTRODUCTION TO OPERATOR THEORY___ 

dense subset 16.4.5. Definition. A subset W of a normed vector space V is dense in V if the 
closure of W is the entire space V. Equivalently, W is dense if each vector in W is 
infinitesimally dose to at least one vector in V. In other words, given any \u) e V 
and any € > O f there is a \w) e W such that \\u - w\\ <6, i.e” any vector in V 
can be approximated, with arbitrary accuracy, by a vector in W. 

rational numbers are A paradigm of dense spaces is the set of rational numbers in the normed 
dense in the real vector space of real numbers. It is a well-known fact that any real number can 
numbers approximated by a rational number with arbitrary accuracy: The decimal (or 
binary) representation of real numbers is precisely such an approximation. An 
intuitive way of imagining denseness is that the (necessarily) infinite subset is equal 
to almost all of the set, and its members are scattered “densely” everywhere in the 
set The embedding of the rational numbers in the set of real numbers, and how they 
densely populate that set, is a good mental picture of all dense subsets. A useM 
property involving the concept of closure and openness has to do with continuous 
maps between normed vector spaces. Let / : IHi —» M 2 be a continuous map. Let 
O 2 bean open set in^.Let/ -1 (O 2 ) denote the inverse image of O 2 , i.e” all points 
of^Ki that are mapped to O 2 . Let |:ci> be a vector in/ -1 (O 2 ) ， | 巧 > =/(l^i)),and 
let B € (x 2 ) be a ball contained entirely in O 2 . Then f~ l (B € (x 2 )) contains \x\) and 
lies entirely in / _i (02). Because of the continuity of /, one can now construct 
an open ball centered at \x\) lying entirely in /— 1 (B € (X 2 )), and by inclusion, in 
/ 一 1 (O 2 ). This shows that every point of / -1 (O 2 ) has a round open neighborhood 
lying entirely in / _1 (C> 2 ). Thus, / _1 (02) is an open subset. One can similarly 
show the corresponding property for closed subsets. We can summarize this in the 
following: 

16.4.6. Proposition. Let f \ { K\^ < K 2 be continuous.Then the inverse image of 
an open (closed) subset of 冗 is an open (dosed) subset of%\. 

Consider the resolvent set of a bounded operator T. We claim that this set is 
open in C. To see this, note that if 入 € p(T), then T — is invertible. On the 
other hand, Problem 16.1 shows that operators close to an invertible operator are 
invertible. Thus, if we choose a sufficiently small positive number e and consider 
all complex numbers 从 within a distance € from X, then all operators of the form 
T - 从 1 are invertible, i.e., \i e p(T). Therefore, any X e p(T) has an open round 
neighborhood in the complex plane all points of which are in the resolvent. This 
shows that the resolvent set is open. In particular, it cannot contain any boundary 
points. However, p(T) and cr(T) have to be separated by a common boundary. 7 
Since p (T) cannot contain any boundary point, <r (T) must carry the entire boundary. 
This shows that cr(T) is a closed subset of C. Recalling that a(T) is also bounded, 
we have the following result. 


7 The spectrum of a bounded operator need not occupy any “area” in the complex plane. It may consist of isolated points or 
line segments, etc., in which case the spectrum will constitute the entire boundary. 


p(T) is open, and 
<r(T) is closed and 
bounded in €. 




16.4 COMPACT SETS 461 


16.4.7. Proposition. For any T 6 the set /)(T) is an open subset ofC and 
cr(T) is a closed, bounded subset ofC. 

Let us go back to the notion of compactness. It turns out that the feature of 
the closed interval [a, b] most appropriate for generalization is the behavior of 
infinite sequences of numbers lying in the interval. More specifically, let { 叫 }^ 
be a sequence of infinitely many real numbers all lying in the interval [a, b]. It is 
intuitively clear that since there is not enough room for these points to stay away 
from each other, they will have to crowd around a number of points in the interval. 
For example, the sequence 

卜 ) ㈣ +W ， 4 ， +4".l 

in the interval [—1 ， +1] crowds around the two points —士 and In fact, the 
points with even rt accumulate aroundand those with odd n crowd around — 士 . 
It turns out that all closed intervals of M. have this property, namely, all sequences 
crowd around some points. To see that open intervals do not share this property 
consider the open interval (0, 1). The sequence = {|, 5 ,...} clearly 

crowds only around zero, which is not a point of the interval. But we already know 
that open intervals are not compact. 

16.4.8. Definition. (Bolzano-Weierstrass property) A subset % of a normed vec- 
compact subset tor space is called compact if every (infinite) sequence in % has a convergent 

subsequence. 

The reason for the introduction of a subsequence in the definition is that a 
sequence may have many points to which it converges. But no matter how many 
of these points there may exist, one can always obtain a convergent subsequence 
by choosing from among the points in the sequence. For instance, in the example 
above, one can choose the subsequence consisting of elements for which «is even. 
This subsequence converges to the single point + 5 . 

An important theorem in real analysis characterizes all compact sets in R rt : 8 

16.4.9. Theorem. (BWHB theorem) A subset ofW 1 is compact if and only if it is 
closed and bounded. 

We showed earlier that the spectrum of a bounded linear operator is closed and 
cr(T) is compact bounded. Identifying C with R 2 , the BWHB theorem implies that 


8 BWHB stands for Bolzano, Weierstrass, Heine, and Borel. Bolzano and Weierstrass proved that any closed and bounded 
subset of R has the Bolzano-Weierstrass property. Heine and Borel abstracted the notion of compactness in terms of open sets, 
and showed that a closed bounded subset of M is compact. The BWHB theorem as applied to R is usually called the Heine- 
Borel theorem (although some authors call it the Bolzano-Weierstrass theorem). Since the Bolzano-Weierstrass property and 
compactness are equivalent, we have decided to choose BWHB as the name of our theorem. 


462 16. AN INTRODUCTION TO OPERATOR THEORY 


16.4.10. Box. The spectrum of a bounded linear operator is a compact sub¬ 
set ofC. 


An immediate consequence of the BWHB Theorem is that every bounded 
subset of has a compact closure. Since is 狂 prototype of allfinite-dimensional 
(normed) vector spaces，the same statement is true for all such vector spaces. What 
is interesting is that the statement indeed characterizes the normed space: 

criterion for 16.4.11. Theorem. A normed vector space is finite-dimensional if and only if every 
finite-dimensionality bounded subset has a compact closure. 

This result can. also be applied to subspaces of a normed vector space: A 
subspace W of a normed vector space V is finite-dimensional if and only if every 
bounded subset ofW has a compact closure inW. A useful version of this property 
is stated in terms of sequences of points (vectors): 

16.4.12. Theorem. A subspace W of a normed vector space Vis finite dimensional 
if and only if every bounded sequence in W has a convergent subsequence in W. 


Karl Theodor Wilhelm Weierstrass (1815-1897) was 
both the greatest analyst and the world’s foremost teacher 
of advanced mathematics of the last third of the nine¬ 
teenth century. His career was also remarkable in another 
way — and a consolation to all “late starters” 一 for he began 
the solid part of his professional life at the age of almost 
40, when most mathematicians are long past their creative 
years. 

His father sent him to the University of Bonn to qualify 
for the higher ranks of the Prussian civil service by studying 
law and commerce. But Karl had no interest in these sub¬ 



jects. He infuriated his father by rarely attending lectures, getting poor grades, and instead, 
becoming a champion beer drinker. He did manage to become a superb fencer, but when he 
returned home, he had no degree. 

In order to earn his living, he made a fresh start by teaching mathematics, physics, 
botany, German, penmanship, and gymnastics to the children of several small Prussian 
towns during the day. During the nights, however, he mingled with the intellectuals of 
the past, particularly the great Norwegian mathematician Abel. His remarkable research on 
Abelian functions was carried on for years without the knowledge of another living soul; he 
didn’t discuss it with anyone at ail, or submit it for publication in the mathematical journals 
of the day. 

All this changed in 1854 when Weierstrass at last published an account of his research 
on Abelian functions. This paper caught the attention of an alert professor at the University 
of Konigsberg who persuaded his university to award Weierstrass an honorary doctor’s 
degree. The Ministry of Education granted Weierstrass a year^ leave of absence with pay 






16.4 COMPACT SETS 463 


to continue his research, and the next year he was appointed to the University of Berlin, 
where he remained the rest of his life. 

Weierstrass^s great creative talents were evenly divided between his thinking and his 
teaching. The student notes of his lectures, and copies of these notes, and copies of copies, 
were passed from hand to hand throughout Europe and even America. Like Gauss he was 
indifferent to fame, but unlike Gauss he endeared himself to generations of students by 
the generosity with which he encouraged them to develop and publish, and receive credit 
for, ideas and theorems that he essentially originated himself. Among Weierstrass’s students 
and followers were Cantor ， Schwarz, Holder, Mittag-Leffler, Sonja Kovalevskaya (Weierstrass’s 
favorite student), Hilbert, Max Planck, Willard Gibbs, and many others. 

In 1885 he published the famous theorem now called the Weierstrass approximation 
theorem (see Theorems 5.2.3 and 8.1.1)，which was given a far-reaching generalization, 
with many applications, by the modem American mathematician M. H. Stone. 

The quality that came to be known as t6 Weierstrassian rigor” was particularly visible in 
his contributions to the foundations of real analysis. He refused to accept any statement as 
“intuitively obvious,” but instead demanded ironclad proof based on explicit properties of 
the real numbers. The careful reasoning required for these proofs was founded on a crucial 
property of the real numbers now known as the BWHB theorem. 


We shall need the following proposition in our study of compact operators: 

16.4.13. Proposition. LetY 1 ) be a closed proper subspace of% and S an arbitrary 
nonnegative number with 0 < 5 < 1. Then there exists a unit vector |uo) € CK 
such that 


I 叉一 叩 11 t 5 V \x) e W. 


Proof. Choose a vector [u) inJCbut notin W and let 
d = min{||t; — x\\ \ \x) e W}. 

We claim that d > 0. To show this, assume otherwise. Then for each (large) n and 
(sufficiently small) €, we could find distinct vectors [\x n )} whose distance from 
\v) would be e/n and for which the sequence {|^„)} would have |v> as a limit. 
Closure of W would then imply that |i;> is in W, a contradiction. So, ^ > 0. 

Now, for any \xq) e W, let 


l«> ^ \x) - 


1^) - l^o) 
Ik - loll 


eW 

dl^-^oll k) + |.yo))-|u) 

\\v-xo\\ 


and note that by the definition of d, the norm of the numerator is larger than d. 
Therefore, ||w|| > d/\\v - xo|| for every ， | 初 〉 e W. If we choose jx。〉such 
that \\v — -toll < d8 -1 , which is possible because dS— 1 > d, then ||u[| > S for all 


M €W. Nowlet |u 0 > = (\v) - |x 0 ))/||i; -xoll- 


□ 





464 16. AN INTRODUCTION TO OPERATOR THEORY 


compact operator 


product of two 
compact operators is 
compact 


finite rank operators 


16.5 Compact Operators 

It is straightforward to show that if 3C is a compact set in and / : IHi ^2 
is continuous, then /(3C) (the image of X) is compact in ^ 2 - Since all bounded 
operators are continuous, we conclude that all bounded operators map compact 
subsets onto compact subsets. There is a special subset of S (M 1 ， ^ 2 ) that deserves 
particular attention. 

16.5.1. Definition. An operator K € %) is called a compact operator if 

it maps a bounded subset of%\ onto a subset of%2 你油 compact closure. 

Since wb will be dealing with function spaces, and since it is easier to deal 
with sequences of functions than with subsets of the space of functions, we find it 
more useful to have a definition of compact operators in terms of sequences rather 
than subsets. Thus, instead of a bounded subset, we take a subset of it consisting of 
a (necessarily) bounded sequence. The image of this sequence will be a sequence 
in a compact set, which, by definition, must have a convergent subsequence. We 
therefore have the following: 

16.5.2. Theorem. An operator K e B(Wi, ^ 2 ) is compact if and only if for any 
bounded sequence {|^)} in the sequence {K |x„>} has a convergent subse¬ 
quence in M 2 . 

16.5.3. Example. Consider 3(IK), the set of bounded operators on the Hilbert space Oi. 
If K is a compact operator and T a bounded operator, then KT and TK are compact. This is 
because {T|^ w > = |>> n )} is a bounded sequence if {| 和 >} is, and {K|y n ) = KT |a: w >} has a 
convergent subsequence, because K is compact. For the second part, use the first definition 
of the compact operator and note that K maps bounded sets onto compact sets, which T 
(being continuous) maps onto a compact set. As a special case of this property we note that 
the product of two compact operators is compact. Similarly, one can show that any linear 
combination of compact operators is compact. Thus, any polynomial of a compact operator 
is compact. In particular, 



where K n is a compact operator. ■ 

16.5.4. Definition. An operator T g £(!Ki, %) is called a finite rank operator 
if its range is finite-dimensional 

The following is clear from Theorem 16.4.12. 

16.5.5. Proposition. A finite rank operator is compact 

In particular, every linear transformation of a finite-dimensional vector space 
is compact. 



16.5 COMPACT OPERATORS * 465 


linear 

transformations of 
finite-dimensional 
vector spaces are 
compact 


K is compact iff K r is 


Hilbert-Schmidt 
operators 


16.5«6. Theorem. If{K n ] e L(!Ki, IK 2 ) are compact and K s %) is such 

that ||K— K„|| -> 0, then K is compact. 

Proof, Let {|x /w )} be a bounded sequence in • Let {Ki \x mi )} be the convergent 
subsequence guaranteed by the compactness of Ki. Now, {|A： mi )} is a bounded 
sequence in It therefore has a subsequence {|jc W 2 >} such that {K 2 |jc OT 2 >} is 
convergent. Note that {Ki \x mi )} is also convergent. Continuing this process，we 
construct the sequence of sequences 

{1 工 m 1 〉 } ， {|^m2 ))>•«•*{\^nik )} 

where each sequence is a subsequence of all sequences preceding it. Furthermore, 
all the sequences {K/ \x mk )} for l = 1， ■. •，灸 are convergent. In particular, if we 
pick the diagonal sequence {|j m )} = {|x w ,„}}，then for any / e N, the sequence 
{K/ |> m )} converges in To show that K is compact, we shall establish that 
{t^m)} is the subsequence of {1^)} such that {K|y,n)} is convergent. Since JC 2 
is complete, it is sufficient to show that {K \y m )} is Cauchy. We use the so-called 
“e/3 trick.” Write 

K \y m ) - K \y n ) = K \y m ) - K/ \y m ) -f- K/ \y m ) - K/ |^ rt > + K/ \y n ) - K \y n ) 

and use the triangle inequality to obtain 

l|K^/n — Ky n II < ||Kj m — Kiy m \\ + \\Kiy m — Kiy n || -|- IIK /办 一 

By choosing m, n, and / large enough, we can make each of the three terms on the 
RHS smaller than t/3; the first and the third ones because K/ K, the second 
one because {K/ Ijn)} is a convergent sequence. □ 

Recall that given an orthonormal basis {1^)}=” any operator T on a Hilbert 
space Ji can be written as c ij 1^') ( e jU where = (ei \ T |e ; >. Now let K 

be a compact operator and consider the finite rank operators 

n 

K « = S C U \ e i) ， C U = (ei\^\ej). 

ij=l 

Clearly, ||K — K n || -> 0. The hermitian adjoints {K«} are also of finite rank (there¬ 
fore, compact). Barring some convergence technicality, we see that K、which is 
the limit of the sequence of these compact operators，is also compact. 

16.5.7. Theorem* Kisa compact operator if and only ifK^ is. 

A particular type of operator occurs frequently in integral equation theory. 
These are called Hilbert-Schmidt operators and defined as follows: 

16.5.8. DeGnition. Let !Hbea Hilbert space, and {k*)}^ 1 an orthonormal basis. 
An operator T € £/(M) is called Hilbert-Schmidt if 



466 16, AN INTRODUCTION TO OPERATOR THEORY 


Hilbert-Schmidt 
operators are 
compact. 


Hilbert-Schmidt 

kernel 


OO OO PQ 

tr^T) ^ ^ ㈤ T 卞 T I。) = ^ (Tei\Jei) = ^ |[T。" 2 < oo. 

s 1 J _ 1 i _ 1 


16H Theorem. Hilbert-Schmidt operators are compact. 

Fora proof, see [Rich 78, pp. 242-246]. 

16.5.10. Example. It is time to give a concrete example of a compact (Hilbert-Schmidt) 
operator. For this, we return to Equation (16.2) with w(;y) = 1， and assume that |m) e 
b). Suppose further that the function K(x t >?) is continuous on the closed rectangle 
[a, b] x [a 9 b] in the x^-plane (or E 2 ). Under such conditions, K{pc, y) is called a Hilbert- 
Schmidt kernel. We now show that K is compact. First note that due to the continuity of 
K(x, y), KU ， y)\ 2 dxdy < oo. Next, we calculate the trace of K^K. Let 
be any orthonormal basis of £ 2 (a, b). Then 


trK f K = £ (ei\^\ei) = £//*/ (ei\x) (x\KUy) (y\K\z) (z\ei) dx dy dz 
i=l 

= JJj <3-1 K|^)* (y|K|z)^(z|^) {et\x) dxdydz 


=8(x-z) 


/// 〈乂⑴心 * blKW&l ( e i\j M dx dy dz 


b pb 



1^(^, y)\^ dxdy 


< oo. 


M 


a Ja 


Bernard Bolzano (1781-1848) was a Czech philosopher, math¬ 
ematician, and theologian who made significant contributions to 
both mathematics and the theory of knowledge. He entered the 
Philosophy Faculty of the University of Prague in 1796, studying 
philosophy and mathematics. He wrote “My special pleasure in 
mathematics rested therefore particularly on its purely specula¬ 
tive parts, in other words I prized only that part of mathematics 
which was at the same time philosophy •” 

In the autumn of 1800 he began three years of theological 
study while he was preparing a doctoral thesis on geometry. He 
received his doctorate in 1804 for a thesis in which he gave his view of mathematics and 
what constitutes a correct mathematical proof. In the preface he wrote: 

I could not be satisfied with a completely strict proof if it were not derived 
from concepts which the thesis to be proved contained, but rather made use 
of some fortuitous, alien, intermediate concept, which is always an erroneous 
transition to another kind. 







16.6 SPECTRUM OF COMPACT OPERATORS 467 


Two days after receiving his doctorate Bolzano was ordained a Roman Catholic priest. 
However, he came to realize that teaching and not ministering defined his true vocation. 
In the same year, Bolzano was appointed to the chair of philosophy and religion at the 
University of Prague. Because of his pacifist beliefs and his concern for economic justice, 
he was suspended from his position in 1819 after pressure from the Austrian government. 
Bolzano had not given up without a fight but once he was suspended on a charge of heresy 
he was put under house arrest and forbidden to publish ‘ 

Although some of his books had to be published outside Austria because of government 
censorship, he continued to write and to play an important role in the intellectual life of his 
country. Bolzano intended to write a series of papers on the foundations of mathematics. 
He wrote two, the first of which was published. Instead of publishing the second one he 
decided to “.，，make myself better known to the learned world by publishing some papers 
which, by their titles, would be more suited to arouse attention.” 

Pursuing this strategy he published Der binomische Lehrsatz … (1816) and Rein 
analytischer Beweis... (1817), which contain an attempt to free calculus from the concept 
of the infinitesimal. He is clear in his intention stating in the preface of the first that the 
work is “a sample of a new way of developing analysis.” The paper gives a proof of the 
intermediate value theorem with Bolzano’s new approach and in the work he defined what 
is now called a Cauchy sequence. The concept appears in Cauchy’s work four years later 
but it is unlikely that Cauchy had read Bolzano’s work. 

After 1817, Bolzano published no further mathematical works for many years. Between 
the late 1820s and the 1840s, he worked on a major work Grossenlehre. This attempt to put 
the whole of mathematics on a logical foundation was published in parts, while Bolzano 
hoped that his students would finish and publish the complete work. 

His work Paradoxien des Unendlichen, a study of paradoxes of the infinite, was pub¬ 
lished in 1851, three years after his death, by one of his students. The word “set” appears 
here for the first time. In this work Bolzano gives examples of 1-1 correspondences between 
the elements of an infinite set and the elements of a proper subset. 

Bolzano，s theories of mathematical infinity anticipated Georg Cantor’s theory of infinite 
sets. It is also remarkable that he gave a function which is nowhere differentiable yet 
everywhere continuous. 


16.6 Spectrum of Compact Operators 

Our next task is to investigate the spectrum (t(K) of a compact operator K on 
a Hilbert space We are particularly interested in the set of eigenvalues and 
eigenvectors of compact operators. Recall that every eigenvalue of an operator on 
a vector space of finite dimension is in its spectrum, and that every point of the 
spectrum is an eigenvalue (see page 457). In general, the second statement is not 
true. In fact, we saw that the right-shift operator had no eigenvalue at all, yet its 
spectrum was the entire unit disk of the complex plane. 

We first observe that 0 6 <r(K), because otherwise 0 G p(K), which implies 
that K = K - 01 is invertible with inverse K _1 . The product of two compact 
operators (in fact, the product of a compact and a bounded operator) is compact (see 





468 16. AN INTRODUCTION TO OPERATOR THEORY 


0 € cr(K) ifKis 
compact 


generalized 

eigenvector 


Example 16.5.3). This yields a contradiction 9 because the unit operator cannot be 
compact: It maps a bounded sequence to itself, not to a sequence with a convergent 
subsequence, 

16.6.1. Proposition* For any compact operator K e !B(!K) on an infinite dimen¬ 
sional Hilbert space, we have 0 e <r(K). 

To proceed, we note that eigenvectors of K corresponding to the eigenvalue A 
belong to the null space of K — 入 1. So, let 10 

'Nx = ker(K — 入 1 )， ^ = Range(K — 入 1 )， 

= ker(K f - 入 *1 )， ^ = RangcfK 1 - 入 *1). 

16.6.2. Theorem. and ^ are finite-dimensional subspaces of Further- 
more, = [NfJ. 

Proof. We use Theorem 16.4.12. Let {|x„)} be a bounded sequence in Since 
K is compact, {K|x„) = X \x n )} has a convergent subsequence. So {\x n }} has a 
convergent subsequence. This subsequence will converge to a vector in if the 
latter is closed. But this follows from Proposition 16.4.6, continuity of K — 入 1 ， the 
fact that is the inverse image of the zero vector, and the fact that any single 
point of a space, such as the zero vector, is a closed subset. Finite-dimensionality 
of follows from the compactness of K 卞 and a similar argument as above. 

To show the second statement, we observe that for any bounded operator T, 
we have 11 \u) e T(5C) 丄 iff (m|u> = 0 for aU \v) e T(K) iff (m[Tx> = 0 for all 
\x) e = 0 for all \x) e = Oiff |w> e kerT^ This 

shows that 1"( 況 ) 丄 =kerT^. The desired result is obtained by letting T = K — XI 
and noting that (W 丄 ) 丄 =W for any subspace W of a Hilbert space. □ 

We note that is the eigenspace of K corresponding to the eigenvalue A. 
However, it may well happen that zero is the only number in cr(K). In the finite- 
dimensional case，this corresponds to the case where the matrix representation of 
the operator is not diagonalizable. In such a case, the standard procedure is to look 
at generalized eigenvectors. We do the same in the case of compact operators. 

16.63. Definition. A vector \u) is a generalized eigenvector ofK of order m if 
(K — A1) m_1 \u) ^ 0 but (K — \u) = 0. The set of such vectors ， Le” the null 

space of (K — Xl) m , will be denoted by 

It is clear that 

{0} = Xf } c c >rf } c ... c >J< m) c wf +1) ... 


Our conclusion is valid only in infinite dimensions. In finite dimensions, all operators, including 1, are compact. 
iU In what follows, we assume that A. ^ 0. 

11 Recall that T(IK) is the range of the operator T. 


(16.6) 



16,6 SPECTRUM OF COMPACT OPERATORS 469 


and each is a subspace of JC. In general, a subspace with higher index is 
larger than those with lower index. If there happens to be an equality at one link of 
the above chain, then the equality continues all the way to the right ad infinitum. 
To see this, let p be the first integer for which the equality occurs, and let« > p 
be arbitrary. Suppose |m) € 十 ”• Then (K — 人 1 ) P+1 [(K _ X"\) n ~ p |m>] = (K — 

入 1f +1 |w> =0. It follows that (K-X1)^ \u) is in But K[ p) = 

So 

(K - 入 1)”m> = (K- 入 1) P [(K — 入 in«>] = 0. 

Thus every vector in ； N^ +1 ) is also in This fact and the above chain imply 
that for all n > p. 

16.6.4. Theorem, The subspace 11 N?) is finite-dimensional for eachn. Moreover, 
there is an integer p such that 

Nf) _forn = 0,h2,...,p-l 

but'N^ = for all n > p. 

Proof. For the first part, use the result of Example 16.5.3 to show that (K —A.1) w = 
K n — where K„ is compact. Now repeat the proof of Theorem 16.6.2 for K n . 

If the integer p exists, llie second part of the theorem follows from the discus¬ 
sion preceding the theorem. To show the existence of p, suppose, to the contrary, 
that ^ for every positive integer n. This means that for every n 9 we 

can find a (unit) vector |t; rt ) e that is not in and that by Proposition 

16.4.13 has the property 

\\v n -M\>\ V \w) e 

We thus obtain a bounded sequence {|u„)}. Let us apply K to this sequence. If 
j > /, then (\vj) — |i?/» g by the construction of |i ； j) and the fact that 

吨十 1 ) G 处 +”. Furthermore, 

(K- Alf{(K — ^)(\Vj)- |w/))} = (K- X-\) j ^ l (\vj)-\vi))^0, 
but 

(K- 入 1) 卜 1 {(K — 入 1)_ - \vi))} = (K-UyQvj)- \v；)) ^ 0 


11 For infinite dimensions, the fact that linear combinations of a subset belong to the subset is not sufficient to make that subset 
into a subspace. The subset must also be closed. We normally leave out the rather technical proof of closure. 


470 16. AN INTRODUCTION TO OPERATOR THEORY 


by the definition of Therefore, (K — 入 1)(|uy) — |w/》）e . Now note 

that 



K 卜》 — K\vi) = (K — XI) |t ； y) — — (K — A1) |v/} + |vy) — |i；/) J. 

G^ +1) ；^ +1) 

It follows from Proposition 16.4.13 that the norm of the vector in curly brackets 
is larger than Hence, ||K |uy> — K |u/) || > | 入 1/2, i.e., since j and / are arbitrary, 
the sequence {K|u n >} does not have a convergent subsequence. This contradicts 
the fact that K is compact. So, there must exist a p such that 5^ 的 =^N^ +1 ). 口 

We also need the range of various powers of K— 入 1 ■ Thus, let = Range(K— 

XI)' One can show that 

: K = 把 0) 2 把 1 ) U 处 ) 2 处十 1 ) 2 … 

16.6.5. Theorem* Each ^ f is a subspace of3i. Moreover, there is an integer q 
such that 对 !) # for n = 0,1 , ..., ^ — 1 , but 戈 ?） = for all n > q. 

Proof. The proof is similar to that of Theorem 16.6.4. The only extra step needed 
is to show that is closed. We shall not reproduce this step. □ 

16.6.6. Theorem. Let q be the integer of Theorem 16.6.5. Then 

1. : 

2. and 对 ) are invariant subspaces ofK. 

3. The only vector in that K _ 入 1 maps to zero is the zero vector. In fact, 

when restricted to the operator K — A.1 is invertible. 

Proof. (1) Recall that 3i = 0 吨 、 means that every vector of % can be 

written as the sum of a vector in and a vector in 灭 ？) ，and the only vector 
common to both subspaces in zero. We show the latter property first. In fact, we 
show that O 3^) = 0 for any integer m. Suppose |x> is in this intersection. 
For each n >q, there must be a vector \x n ) in such that W = (K-^ir \Xn) 
because for n >q.lf \x) ^ 0, then |jc n > ^ for each n. Now let r 

be the larger of the two integers (p, q) where p is the integer of Theorem 16.6.4. 
Then 

M 《地 ). 


(16.7) 






16.6 SPECTRUM OF COMPACT OPERATORS 471 


From 

0 = (K - 久 1) w W = (K- Xi) m+r \x r ) 
and 

0#(K-|jc) = (K 一 X1) m+r_1 1^) 

it follows that \x r ) e W[ m+r) .But jsf[ m+r) = >f[ r) == J<[ p \ contradicting Equation 
(16.7). We conclude that |x> must be zero. By the definition of Kp)，for any vector 

\z) in 3i 9 we have that (K - k^) q \z) e Since 01^ = there must be a 
vector \y) e % such that (K —入 I) 9 |z> = (K — X\) 2q |y> or (K — X1)^[|z> — (K — 
\y)] = 0. This shows that \z) - (K - 入 1)”;y〉is in On the other hand, 

\z) - [\z) -(K-X1)« \y)] + (K - 入 1) 《 |y >， 

' - V - f ' - ， - ， 

€吨） e 处） 

and the first part of the theorem is done. 

⑵ For the second part，we simply note that (K — c c : N ^)， 

and that 

K(3sf^ } ) = (K - 入 1 + 入 1)( 吨 )） = (K- 入 1)( 吧 ) + 入 1( 处 )） C 处 ). 

' - ,-^ ' --- 1 

c3^ ) 

Similarly, 

K ( 处 ) ） =(K- 入 1 + 入 1)( 处 )） =(K — 入 1)( 处 ))+ 入 1( 处 )） C 处 )• 

' - v - 1 ' - y - * 

c^ +1) c^ 

⑶ Suppose \z) e ^ and (K-A1)|z) - 0. Then |^) = (K- X1) q |y) for some 

|^) in !H, andO = (K-^1)|z> = (K- A.1)^ +1 b), or |y) € N^ +1) . From part 
(1) — with m = q-\-l — we conclude that \z) = 0. It follows that K —A,1 is injective 
(or 1-1). We also have 

(K - 人 1) 处 ) =(K - ； U)(K - 入 1) 他） 

=(K - 入 1) 穿 +1 ( ： K)= 处 +1) = 处 ). 

Therefore, when restricted to D^)，the operator K 一入 1 is surjective (or onto) as 
well. Thus, (K — 入 1): 3^) — bijective，and therefore has an inverse. □ 

16.6.7. Corollary, The two integers p and q introduced in Theorems 16.6.4 and 
16.6.5 are equal. ' 


472 16. AW INTRODUCTION TO OPERATOR THEORY 


Proof. The proof is left as a problem for the reader (see Problem 16.5). □ 

The next theorem characterizes the spectrum of a compact operator completely. 
In order to prove it, we need the following lemma. 

16.6.8* Lemma. Let K>, : 3^) — be the restriction ofK to Then: 

!• Each nonzero point ofcr(K) is an eigenvalue ofK whose eigenspace isfinite- 
dimensional. 

2 _ a(K x )^a(K). 

3. Every infinite sequence in a(K) converges to zero. 

Proof. (1) If k # 0 is not an eigenvalue of K, the null space of K - XI is zero. 
This says that {0} = = X^ 1 ) = ■ ■ ■, i.e” p = q = 0. From Theorem 16.6.6, 

we conclude that ® == Therefore, K 一 入 1 is onto. Part (3) of 

Theorem 16.6.6 shows that K 一乂 1 is one-to-one. Thus, K — XI is invertible, and 
入 € /o(K). So, X ^ cr(K). 

⑵ Clearly ot(Ka.) c a(K). To show the reverse inclusion, first note that Kp) 

is infinite-dimensional because has finite dimension. Thus by Proposition 
16.6.1,0 E <j(K ； 0. Now let fx — nonzero and distinct from X — be in cr(K). By part 
(1) /ti is an eigenvalue of K, so there is a vector \u) e !K such that K|m) = fi |w>. 
We also have (K — XI) \u) = (/x — k) |m>, or (K — )A) q \u) = (fi — X) q \u). 
Thus ， （从 _ 入 ) 9 |m 〉 (and, therefore |m>) is in Therefore, we can restrict K to 

i.e., we can write K\u) = fi \u) as \u) = /x |w), or (K 入一 /xl) \u) — 0 . 
Hence, fi e <r(Kx). We conclude that every point of cr(K) is a point of <r(Kx) and 
o-(K) c (j(K A ). 

(3) Let k be the limit of an infinite sequence in cr(K) = a(Kx). If X ^ 0, 
K 入 — 入 1 will be invertible (Theorem 16.6.6 part 3), indicating that 入 e ^o(K^). 
Since p(K；J is open, we can find an open round neighborhood of A entirely in 
p(K ； 0. This contradicts the property of a limit of an infinite sequence whereby any 
neighborhood of the limit contains (infinitely many) other points of the sequence. 
Therefore, we must conclude that no nonzero X can be the limit of an infinite 
sequence in a (K). 口 

16.6.9. Theorem. Let Kbea compact operator on an infinite-dimensional Hilbert 
space J{. Then 

1. 0 e (t(K). 

2. Each nonzero point of a (K) is an eigenvalue ofK whose eigenspace is finite- 
dimensional. 


3. <t(K) is either a finite set or it is a sequence that converges to zero. 





16.7 SPECTRAL THEOREM FOR COMPACT OPERATORS 473 


Figure 16.1 The shaded area represents a convex subset of the vector space. It consists 
of vectors whose tips lie in the shaded region. It is clear that there is a (unique) vector 
belonging to the subset whose length is minimum. 

Proof. (1) was proved in Proposition 16.6.1. (2) was shown in the lemma above. 

(3) Let a„(K) ^ {k e cr(K) | \k\ > l/n}. Clearly, a w (K) must be a finite set, 
because otherwise the infinite set would constitute a sequence that by compactness 
of a n (K) would have to have (at least) a limit point. By part (2)，this limit must be 
zero, which is not included in <r„(K). Let cr\(K) = arranged in order of 

decreasing absolute value. Next, let 入 ， 入众 +2, •.. label the elements of 0*2 (K) 
not accounted for in <7i(K), again arranged in decreasing absolute value. If this 
process stops after a finite number of steps, a (K) is finite. Otherwise, continue the 
process to construct a sequence whose limit by necessity is zero. □ 


16.7 Spectral Theorem for Compact Operators 

The finite-dimensional spectral decomposition theorem of Chapter 4 was based on 
the existence of eigenvalues, eigenspaces, and projection operators. Such existence 
was guaranteed by the existence of an inner product for any finite-dimensional vec¬ 
tor space. The task of establishing spectral decomposition for infinite-dimensional 
vector spaces is complicated not only by the possibility of the absence of an inner 
product, but also by the questions of completeness, closure, and convergence. One 
can eliminate the first two hindrances by restricting oneself to a Hilbert space. 
However, even so, one has to deal with other complications of infinite dimensions. 

As an example, consider the relation T? = W ㊉ W 丄 ， which is trivially true for 
any subspace W in finite dimensions once an orthonormal basis is chosen. Recall 
that the procedure for establishing this relation is to complement a basis of W 
to produce a basis for the whole space. In an infinite-dimensional Hilbert space, 
we do not know a priori how to complement the basis of a subspace (which may 
be infinite-dimensional). Thus, one has to prove the existence of the orthogonal 
complement of a subspace. Without going into details, we sketch the proof. First 
a definition: 





474 16. AN INTRODUCTION TO OPERATOR THEORY 


convex subset 16.7.1. Definition. A convex subset E of a vector space is a collection of vectors 
such that if\u) and\v) are in E y then |m)— ^(|«) — |v)) is also in Efor allO < t < 1. 

Intuitively, any two points of a convex subset can be connected by a straight line 
segment lying entirely in the subset. 

Let 五 be a convex subset (not a subspace) of a Hilbert space 況 . One can show 
that there exists a unique vector in E with minimal norm (see Figure 16.1). Now 
let M be a subspace of %. For an arbitrary vector \u) in IK, consider the subset 
E = |m> — M, i.e., all vectors of the form \u) — |m) with \m) e M. Denote 
the unique vector of minimal norm of \u) — M by |m) — \Pu) with \Pu) e M. 
One can show that \u) — \Pu) is orthogonal to |w), i.e., (|w) — \Pu)) € M 丄 (see 
Figure 16.2). Obviously, only the zero vector can be simultaneously in M and 抓丄 . 
Furthermore, any vector \u) in % can be written as \u) ~ [ Pu) + (|w 〉一 | Pu}) with 
\Pu) e M and (|m) — \Pu)) G JVC 丄 . This shows that M = M © JV [丄 • In words, 
a Hilbert space is the direct sum of any one of its subspaces and the orthogonal 
complement of that subspace. The vector \ Pu) so constructed is the projection of 
\u) in M. 

A projection operator P can be defined as a linear operator with the property 
that P 2 = P. One can then show the following. 

16.7*2* Theorem. The kernel ker P of a projection operator is the orthogonal 
complement of the range P(^K) ofP in % ijfP is hermitian. 

This is the reason for demanding hermiticity of the projection operators in our 
treatment of the finite-dimensional case. 

We now concentrate on the compact operators, and first look at hermitian 
compact operators. We need two lemmas: 

16.7.3. Lemma. LetW e !B(IK) be a bounded hermitian operator on the Hilbert 
space Then |[H|| = max{| (Hx|x) [ 11[^|| = 1}. 

Proof. Let M denote the positive number on the RHS. From the definition of the 
norm of an operator, we easily obtain | {Hx\x) \ < [|H[| ||x || 2 = ||H||,or Af < ||H||. 
Fox the reverse inequality, see Problem 16.7. □ 

16.7.4, Lemma. Let K e B(M) be a hermitian compact operator. Then there is 
an eigenvalue X of K such that |X| = [|K||. 

Proof. Let {|x„>} be a sequence of unit vectors such that 
||K||=lim|{Kx w U n )|. 

This is always possible, as the following argument shows. Let 6 be a small positive 
number. There must exist a unit vector \x\) e !K such that 


l|K|| = I (»Ori|A ： i> I, 




16.7 SPECTRAL THEOREM FOR COMPACT OPERATORS 475 



Figure 16.2 The shaded area represents the subspace M of the vector space. The convex 
subset E consists of all vectors connecting points of M to the tip of |m). It is clear that there 
is a (unique) vector belonging to E whose length is minimum. The figure shows that this 
vector is orthogonal to M. 


because otherwise, ||K|| — € would be greater than or equal to the norm of the 
operator (see Lemma 16.7.3). Similarly, there must exist another (different) unit 
vector \x 2 ) € % such that ||K|| - 6/2 = | (KX 2 U 2 ) |. Continuing this way, we 
construct an infinite sequence of unit vectors {|^„>} with the property ||K|| —e/n = 
|( Kx n I x n ) |. This construction clearly produces the desired sequence. Note that the 
argument holds for any hermitian bounded operator; compactness is not necessary. 

Now define a n = {Kx n |x n > and let a = lima n , so that \a\ = ||K||. Compact¬ 
ness of K implies that{|Kx«)} converges. Let |^) e IK be the limit of {|Then 
ll^ll = lim ||Kjc w || < ||K|| ||^ n || = ||K||. On the other hand, 

0 < \\Kx n -ax n \\ = \\Kx n \\ 2 -2a (Kx n \x n )-\-\a\ 2 . 

Taking the limit and noting that and a are real, we get 

0 < lim||K^„|| 2 -2alim{IO(： n |^ tt ) + \a\ 2 = \\y\\ 2 - 2a 2 + a 2 =>- \\y\\ 2 > ||K|| 2 . 

It follows from these two inequalities that ||y|| = j|K|| and that lim |x„) = \y) /(x. 
Furthermore, 

(K — «1)(|y) /a) = (K — a1)(lim \x n )) — lim(K — al) \x n ) = 0 
Therefore, a is an eigenvalue of K with eigenvector \y) /a. □ 

Let us order all the eigenvalues of Theorem 16.6.9 in decreasing absolute value. 
Let denote the (finite-dimensional) eigenspace corresponding to eigenvalue 
and P n the projection to M„. The eigenspaces are pairwise orthogonal and 
p w p m = 0 for w 一 n. This follows in exact analogy with the finite-dimensional 
case. 

First assume that K has only finitely many eigenvalues. 




476 16. AN INTRODUCTION TO OPERATOR THEORY 


Let M 三 M! ㊉ M 2 ㊉…㊉ 3Vt,, = Z^：=i ㊉ Mj, and let 鳩 be the orthogonal 
complement of M. Since each eigenspace is invariant under K, so is M. Therefore, 
by Theorem 4.2.3 — which holds for finite- as well as infinite-dimensional vector 
spaces_and the fact that K is hermitian. Mo is also invariant. Let Kq be the restric¬ 
tion of K to Mo. By Lemma 16.7.4, Kq has an eigenvalue k such that \X\ = ||Ko j|. 
If k 卢 0, it must be one of the eigenvalues already accounted for, because any 
eigenvalue of Kq is also an eigenvalue of K. This is impossible, because Mo is 
orthogonal to all the eigenspaces. So, 人 = 0, or | 入 | = ||Ko|| = 0, or Ko = 0 , i.e .， 
K acts as the zero operator on Mo. 

Let Po be the orthogonal projection on Mo. Then IK = ㊉ My ， and we 

have 1 = ^_ 0 Py, and for an arbitrary |-t> € Oi, we have 

KW = K(^P ; W| = t K(P 々 ”= 亡 〜 (Pj W). 

\j=0 } j=0 J=1 

It follows that K = kjPj. Notice that the range of Kis ^ =1 ㊉ M ; ‘， which is 

finite-dimensional. Thus, K has finite rank. Barring some technical details, which 
we shall not reproduce here, the case of a compact hermitian operator with infinitely 
many eigenvalues goes through in the same way (see [DeVi 90, pp. 179—180]): 

spectral theorem for 16.7.5. Theorem, (spectral theorem: compact hermitian operators) Let K be a 
compact hermitian compact hermitian operator on a Hilbert space Let be the distinct 

operators nonZ ero eigenvalues of K arranged in decreasing order of absolute values. For 
each j let JAj be the eigenspace of K corresponding to eigenvalue kj and Pj its 
projection operator with the property PiPj = Ofor i _ j• Then: 

1. If N < oo, then K is an operator of finite rank, K = an ^ 

% = Mo 0 Mi 0 - • • 0 or "\ = J^ = qPj‘, where Mo is infinite- 
dimensional. 

2. If N — oo t then Xj 0 as j ^ oo f K = XjPj，and % = 

Mo 0 or ^ = J2%=o ^j> Mo could be finite- or infinite- 

dimensional. Furthermore ， 

m 

K- y^^-P/ =| 入 m +i| V m, 

which shows that the infinite series above converges for an operator norm. 

The eigenspaces of a compact hermitian operator are orthogonal and, by (2) 
of Theorem 16.7.5, span the entire space. By the Gram-Schmidt process, one can 
select an orthonoimal basis for each eigenspace. We therefore have the following 
corollary. 

16.7.6. Corollary. If K is a compact hermitian operator on a Hilbert space 5C, 
then the eigenvectors ofK constitute an orthonormal basis for 






16.7 SPECTRAL THEOREM FOR COMPACT OPERATORS 477 


spectral theorem for 
compact normal 
operators 


16.7.7. Theorem, Let Kbe a compact hermitian operator on a Hilbert space Ji 
and let K — A. ; -Py, where N could be infinite. A bounded linear operator on 

% commutes with K if and only if it commutes with every Pj. 

Proof, The “if” part is straightforward. So assume that the bounded operator T 
commutes with K. For \x) G Mj, we have (K —入 y)T |x〉= T(K — Xj) \x) = 0. 
Similarly, (K \x) =T^(K-kj)\x) = 0, because 0 = [T, K] f = [T 1 *, K]. 

These equations show that both T and leave Mj invariant. This means that Mj 
reduces T, and by Theorem 4.2.5, TPj = PjT. □ 

Next we prove the spectral theorem for a normal operator. Recall that any 
operator T can be written as T = J r +iJ( where T r = 士 (T+T 卞） and T!’ = 去 (T—T 卞 ) 
are hermitian, and since both T andT^ are compact, T r and Ti are compact as well. 
For normal operators, we have the extra condition that [T r , T*] [T, T 卞 ] =a Let 

T r = andT/ = /ikOk be the spectral decompositions of T r and 

T*. Using Theorem 16.7.7, it is straightforward to show that if [J r , T*] = 0 then 
[Pj f Qjt] = 0. Now, since % = ㊉ Ht，where Mj are the 

eigenspaces of T r and those of Ti，we have, for any \x) e 

I N \ / N \ N N 

T r |x> = I I f 1^)) = W • 

\j=l / \^=0 / ):1 k=Q 

Similarly, T/ \x) = T i (Yl I j=Q p j \ x )) = Ef=i E^=o ^kO k Pj k>. Combining 
these two relations and noting that Ok^j ― PjQfe gives 

N N ' 

T|x} = (T r +/T0W=^ + ^^PjOjc \x ). 

7=0 k=Q 

The projection operators PjOfc project onto the intersection of My and There¬ 
fore, Mj H !>4 are the eigenspaces of T. Only those terms in the sum for which 
Mj n K/t 0 contribute. As before, we can order the eigenvalues according to 
their absolute values. 

16J.8. Theorem, (spectral theorem: compact normal operators) Let J be a com¬ 
pact normal operator on a Hilbert space %. Let (where N can be oo) 

be the distinct nonzero eigenvalues ofT arranged in decreasing order of absolute 
values. For each n let M n be the eigenspace ofT corresponding to eigenvalue k n 
and P n its projection operator with the property P m P n = 0/or m ^ n. Then: 

L If N < oo, then T is an operator of finite rank T = Y^ ， n^i an ^ 
= Mo ㊉ 3Vti ㊉.■■㊉ Mat, or A = w ^ ere ^0 is infinite- 

dimensional. 

2. If N = cx) } then X n 0 as n ^ oo, T = M = 

: Mo ㊉ ©3Vt n , or 1 = P J » where ^0 could be finite-or infinite- 

dimensional. 



478 16. AN INTRODUCTION TO OPERATOR THEORY 


As in the case of a compact hermitian operator, by the Gram-Schmidt process, 
one can select an orthonormal basis for each eigenspace of a normal operator, in 
which case we have the following: 

16,7.9. Corollary. If T is a compact normal operator on a Hilbert space then 
the eigenvectors ofT constitute an orthonormal basis for %. 

One can use Theorem 16.7.8 to write any function of a normal operator T as 
an expansion in terms of the projection operators of T. First we note that has 
入 g as its expansion coefficients. Next, we add various powers of T in the form of 
a polynomial and conclude that the expansion coefficients for a polynomial p(T) 
are p(X n ). Finally, for any function /(T) we have 

00 

/( t ) = E，(、) p ». (168) 

n—\ 


Johann (John) von Neumann, (1903-1957), the eldest of three 
sons of Max von Neumann, a well-to-do Jewish banker, was 
privately educated until he entered the gymnasium in 1914. His 
unusual mathematical abilities soon came to the attention of 
his teachers, who pointed out to his father that teaching him 
conventional school mathematics would be a waste of time; 
he was therefore tutored in mathematics under the guidance of 
university professors, and by the age of nineteen he was already 
recognized as a professional mathematician and had published 
his first paper. 

Von Neumann was Piivatdozent at Berlin from 1927 to 1929 
and at Hamburg in 1929-1930, then went to Princeton University for three years; in 1933 
he was invited to join the newly opened Institute for Advanced Study, of which, he was the 
youngest permanent member at that time. At the outbreak of World Warn, von Neumann 
was called upon to participate in various scientific projects related to the war effort: In 
particular, from 1943 he was a consultant on the construction of the atomic bomb at Los 
Alamos. After the war he retained his membership on numerous government boards and 
committees, and in 1954 he became a member of the Atomic Energy Commission. His 
health began to fail in 1955, and he died of cancer two years later. 

It is only in comparison with the greatest mathematical geniuses of history that von 
Neumann’s scope in pure mathematics may appear somewhat restricted; it was far beyond 
the range of most of his contemporaries, and his extraordinary work in applied mathematics, 
in which he certainly equals Gauss, Cauchy, or Poincare, more than compensates for its 
limitations. Von Neumann’s work in pure mathematics was accomplished between 1925 
and 1940, when he seemed to be advancing at a breathless speed on all fronts of logic 
and analysis at once, not to speak of mathematical physics. The dominant theme in von 
Neumann’s work is by far his work on the spectral theory of operators in Hilbert spaces. 
For twenty years he was the undisputed master in this area, which contains what is now 
considered his most profound and most original creation, the theory of rings of operators. 











16.7 SPECTRAL THEOREM FOR COMPACT OPERATORS 479 


The first papers (1927) in which Hilbert space theory appears are those on the foundations 
of quantum mechanics. These investigations later led von Neumann to a systematic study 
of unbounded hennitian operators. 

Von Neumann’s most famous work in theoretical physics is his axiomatization of quan¬ 
tum mechanics. When he began work in that field in 1927, the methods used by its founders 
were hard to formulate in precise mathematical tenns: “Operators” on ^functionswere 
handled without much consideration of their domain of definition or their topological prop¬ 
erties, and it was blithely assumed that such “operators,” when self-adjoint，could always 
be “diagonalized” (as in the finite dimensional case), at the expense of introducing Dirac 
delta functions as “eigenvectors.” Von Neumann showed that mathematical rigor could be 
restored by taking as basic axioms the assumptions that the states of a physical system 
were points of a Hilbert space and that the measurable quantities were Hermitian (generally 
unbounded) operators densely defined in that space. 

After 1927 von Neumann also devoted much effort to more specific problems of quan¬ 
tum mechanics, such as the problem of measurement and the foundation of quantum statis¬ 
tics and quantum thermodynamics, proving in particular an ergodic titieorem for quantum 
systems. All this work was developed and expanded in Mathematische Grundlagen der 
Quantenmechanik (1932), in which he also discussed the much-debated question of “causal¬ 
ly” versus “indeterminacy” and concluded that no introduction of “hidden parameters” 
could keep the basic structure of quantum theory and restore “causality.” 

Von Neumann’s uncommon grasp of applied mathematics, treated as a whole without 
divorcing theory from experimental realization, was nowhere more apparent than in his 
work on computers. He became interested in numerical computations in connection with 
the need for quick estimates and approximate results that developed with the technology 
used for the war effort~particularly the complex problems of hydrodynamics 一 and the 
completely new problems presented by the harnessing of nuclear energy，for which no 
ready-made theoretical solutions were available. Von Neumann^ extraordinary ability for 
rapid mental calculation was legendary. The story is told of a friend who brought him a 
simple kinematics problem. Two trains, a certain given distance apart, move toward each 
other at a given speed. A fly, initially on the windshield of one of the trains, flies back and 
forth between them, again at a known constant speed. When the trains collide, how far has 
the fly traveled? One way to solve the problem is to add up all the successively smaller 
distances in each individual flight, (The easy way is to multiply the fly’s speed by the time 
elapsed until the crash.) After a few seconds of thought, von Neumann quickly gave the 
correct answer. 

“That’s strange，” remarked his friend, “Most people try to sum the infinite series/' 

“What’s strange about that?” von Neumann replied. “That’s what I did.” 


In closing this section, let us remark that the paradigm of compact operators, 
namely the Hilbert-Schmidt operator, is such because it is defined on the finite 
rectangle [a, b] x [a, b]. If this rectangle grows beyond limit, or equivalently, if 
the Hilbert space is 厶 2 ( 沢加 ）， where Rqo is some infinite region of the real line, 
then the compactness property breaks down, as the following example illustrates. 

16.7.10. Example. Consider the two kernels 

K\(x, t) = e~^ x ~^ and 0 = sinxt 







480 16. AN INTRODUCTION TO OPERATOR THEORY 


where the first one acts on £ 2 (— 00 , 00 ) and the second one on £ 2 (0, 00 ). One can show 
(see Problem 16.8) that these two kernels have, respectively, the two eigenfunctions 


e lcct t a eR, 


and 




a > 0, 


corresponding to the two eigenvalues 


久 


2 


+ « 2 , 


Of G M, 


and 




We see that in the first case, all real numbers between 0 and 2 are eigenvalues, rendering 
this set uncountable. In the second case, there are infinitely (in fact, uncountably) many 
eigenvectors (one for each a) corresponding to the single eigenvalue ^/tt/2. Note, however, 
that in the first case the eigenfunctions and in the second case the kernel have infinite 
norms. M 


16.8 Resolvents 

The discussion of the preceding section showed that the spectrum of a normal 
compact operator is countable. Removing the compactness property in general will 
remove countability, as shown in Example 16.7.10. We have also seen that the right- 
shift operator, a bounded operator, has uncountably many points in its spectrum. We 
therefore expect that the sums in Theorem 16.7.8 should be replaced by integrals 
in the spectral decomposition theorem for (noncompact) bounded operators. We 
shall not discuss the spectral theorem for general operators. However, one special 
class of noncompact operators is essential for the treatment of Sturm-Liouville 
theory (to be studied in Chapters 18 and 19). For these operators, the concept of 
resolvent will be used, which we develop in this section. This concept also makes a 
connection between the countable (algebraic) and the uncountable (analytic) cases. 

resolvent of an 16.8.1. Definition. Let T be an operator and k e p(T). The operator R^(T)= 
operator (j — ^1) -1 is called the resolvent ofl at k. 

There are two important properties of the resolvent that are useful in analyzing 
the spectrum of operators. Let us assume that X,fi e p(T ), 入 _ 弘， and take 
the difference between their resolvents. Problem 16.9 shows how to obtain the 
following relation: 

R 入 (T) — R/,(T)=( 入 - 弘 ) R 入 (T)FV(T). (16.9) 

To obtain the second property of the resolvent, we formally (and indefinitely) 
differentiate R^(T) with respect to X and evaluate the result 3.tX = fi: 

^-Ra(t) = ~UJ- 入 I) -1 ] = (T~ 久 1) 一 2 = nl(T). 




16.8 RESOLVENTS 481 


Differentiating both sides of this equation, we get 2F^(T)，and in general, 

= n\Rl +1 (J). 




dk n 


dk n 


Assuming that the Taylor series expansion exists, we may write 

OQ 


K— 0 


«! dX n 


k=fl 


(16.10) 


which is the second property of the resolvent. 

We now look into the spectral decomposition from an analytical viewpoint. 
For convenience, we concentrate on the finite-dimensional case and let A be an 
arbitrary (not necessarily hermitian) NxN matrix. Let 入 be a complex number that 
is larger (in absolute value) than any of the eigenvalues of A. Since all operators 
on finite-dimensional vector spaces are compact, Lemma 16.7.4 assures us that 
\k\ > ||Tj|, and it is then possible to expand Ra. (T) in a convergent power series as 
follows: 


_)-(A-^ir 1 = £ (^) n . (16.11) 

n=0 

This is the Laurent expansion of R^(A). We can immediately read off the residue 
of R 入 (A) (the coefficient of 1 / 入)： 

Res[R 人 (A)] = —1 4 ^ = 1, 

where P is a circle with its center at the origin and a radius large enough to 
encompass all the eigenvalues of A [see Figure 16.3(a)], A similar argument shows 
that 


2ni Jr 
and in general, 




k n Rx(A )dX = A n for = 0, 1， ... 


2jt/ ji' 

Using this and assuming that we can expand the function /(A) in a power series ， 
we get 


/( 入 ) R 入 ㈧ d = /(A). 


2ni j r 

Writing this equation in the form 

1 I fW 


(16.12) 


2ttz Jy 一 A 


= /(A) 




482 16 - AN INTRODUCTION TO OPERATOR THEORY 



Figure 16.3 (a) The large circle encompassing all eigenvalues, (b) the deformed contour 

consisting of small circles orbiting the eigenvalues. 


makes it recognizable as the generalization of the Cauchy integral formula to 
operator-valued functions. To use any of the above integral formulas, we must 
know the analytic behavior of R 入 (A). From the formula of the inverse of a matrix 
given in Chapter 3, we have 


[RUA^-KA-Xir 1 ]^ 


OjkW 

det(A — 入 1) 


CjkW 

PW 


where C 片 ( 入 ） is the cofactor of the ijth element of the matrix A — 入 1 and p ( 入 ) 
is the characteristic polynomial of A. Clearly, ( 入 ） is also a polynomial. Thus, 
[RA.(A)]；fe is a rational function of X. It follows that Ra.(A) has only poles as singu¬ 
larities (see Example 10.2.2). The poles are simply the zeros of the denominator, 
i.e_ ， the eigenvalues of A. We can deform the contour r in such a way that it consists 
of small circles yj that encircle the isolated eigenvalues Xj [see Figure 16.3(b)]. 
Then, with /(A) = 1 ， Equation (16.12) yields 



r 

Ra(A) dk = y^Pj -， 


Ra(A) dX. 


(16.13) 


It can be shown (see Example 16.8.2 below) that {Pj} is a set of orthogonal pro¬ 
jection operators. Thus, Equation (16.13) is a resolution of identity, as specified in 
the spectral decomposition theorem in Chapter 4. 



16.8 RESOLVENTS 483 


16.8.2. Example. We want to show that the Pj are projection operators. First let i = j. 
Then 12 


2 

(-^) 物 • 


Note that 入 need not be equal to In fact, we are free to choose | 入一入 ^ | > \fi — Xj |, i.e., 
let the circle corresponding to k integration be outside that of yx integration. 1 ^ We can then 
rewrite the above double integral as 


P 卜（一 


l7ti 



尸 7 十 ) 


Ra(a)R^(a)^^ 


(- 士 ) 2 £; 4 r 昤 - ㈢ 師 


2^l) 


2 


，麟 IS 


.(n) 


Rfx(A) d\x 


dk 


，^ •) 入 一 fji 


where we used Equation (16.9) to go to the second line. Now note that 


d\x 


，>) 久 一 yx 


0 


dk 


巧 X-fx 


2iti 


because X lies outside yj^ and /x lies inside Hence, 


(- 占 ) 


0 - 2ni (p ^ R^,(A) dfi 


} = -^7 


，⑻ 


dfi = Py . 


The remaining part, namely PjPk = 0 for ^ j, can be done similarly (see Problem 
16.10). B 


Now we let /(A) = A in Equation (16.12)，deform the contour as above, and 
write 



( 入 一 ~)R 入 (A)rf 入 


7=1 


= 


♦ (X - 入 y)R 又 (A) 入 . 

hi 


(16.14) 


12 We have not discussed multiple integrals of complex functions. A rigorous study of such integrals involves tlie theory of 
functions of several complex variables — a subject we have to avoid due to lack of space. However, in the simple case at hand, 
the theory of real multiple integrals is an honest guide. 

13 This is possible because the poles are isolated. 


484 16. AN INTRODUCTION TO OPERATOR THEORY 


It can be shown (see Problem 16.11) that 
D r j = ^ (X - Xj) n Rx (A) dk. 

In particular, since R^(A) has only poles as singularities, there exists a positive 
integer m such that Dj = 0. We have not yet made any assumptions about A. If 
we assume that A is hermitian, for example, then R^(A) will have simple poles 
(see Problem 16.12). It follows that ( 入一人 j)R 入 (A) will be analytic at kj for all 
j = l,2 t ... ,r, and Dj = 0 in Equation (16.14). We thus have 

r 

A = E 入办 

which is the spectral decomposition discussed in Chapter 4. Problem 16.13 shows 
that the Pj are hermitian. 

16.8,3. Example. The most general 2x2 hermitian matrix is of the form 



« 12 \ 

fl 22/ 


where a\\ and an are real numbers. Thus, 

det(A — 人 1)= 入 2 — （an + 入 + «11«22 ~ \ a l2\ 2 


which has roots 

ll = +022 — /(ail — ^2l) 2 + 4| 叱| 2 ]， 

.H = jim +«22 + y (an - a 2 2 ) 2 4 - 4|a 12 | 2 ]_ 


The inverse of A — 入 1 can immediately be written: 

_ = ( a - xir 1 = —~ - ^ 严 22 :入 — 叱」 

det(A — 入 1) V ~ a \2 G 11 — 

二 _ J _ /«22 - ^ - a \2 \ 

f 入一入 a\-\ — X/ " 


We want to verify that R 人 （ A) has only simple poles. Two cases arise: 

1. If 入 i 一入 2, then it is clear that R 久 (A) has simple poles. 

2. If 久 i = 入 2 , it appears that R 入 (A) may have a pole of order 2. However, note that if 

入 1 = 入 2, then the square roots in the above equations must vanish. This happens iff 
a\\ = ^22 = a and a\2 = 0. It then follows that 三 a ， and 

.. 1 (CL 一 入 0 \ 

= 0 a — J. 


This clearly shows that R 入 (A) has only simple poles in this case ‘ 


16.9 PROBLEMS 485 


Jordan canonical If A is not hermitian, Dj ^ 0; however, Dj is nevertheless nilpotent. That 
f orm is, Dj = 0 for some positive integer m. This property and Equation (16.14) can 

be used to show that A can be cast into a Jordan canonical form via a similarity 
transformation. That is, there exists an x matrix S such that 


/Ji 0 0 

, 0 J 2 0 

SAS -1 = J = • ■. 

■ i ■ 

■ _ ■ 

<0 0 0 

where is a matrix of the form 


0\ 

0 

零 

■ 

■ 


fo 


4 = 



1 0 0 
X 1 0 


0 0\ 
0 0 
0 0 


^0 0 0 0 ... A. \) 


in which 入 is one of the eigenvalues of A. Different J* may contain the same 
eigenvalues of A. For a discussion of the Jordan canonical form of a matrix, see 
[Birk 77], [Denn 67], or [Halm 58], 


16.9 Problems 


16_1, Suppose that Sis a bounded operator, T an invertible operator, and that 



Show that S is invertible. Hint: Show that T _1 S is invertible. Thus, an operator 
that is “sufficiently dose” to an invertible operator is invertible. 

16.2. Let V and W be finite-dimensional vector spaces. Show that T e -C(V, W) 
is necessarily bounded. 

16.3. Let IK be a Hilbert space, and T e an isometry, i.e., a linear operator 
that does not change the norm of any vector. Show that ||T|| = 1, 


16«4» Show that (a) the unit operator is not compact, and that (b) the inverse of 
a compact operator cannot be bounded. Hint: For (b) use the results of Example 

16.5.3. 

16.5. Prove Corollary 16.6.7. Hint: Let |x> e and write it as |^:) = |n) + |r) 

with \n) € and |r> G 3^). Apply (K — to |r), and invoke part (3) of 

Theorem 16.6.6 to show that \r) € 3sf^. Conclude that \r) — 0, 




486 16. AN INTRODUCTION TO OPERATOR THEORY 


and q > p. To establish the reverse inequality, apply (K — kA) p to both sides of 
the direct sum of part (1) of Theorem 16.6.6, and notice that the LHS is the 
second term of the RHS is zero, and the first term is Now conclude that 

P>^1- 

16.6. Let |w> e % and let M be a subspace of IK. Show that the subset E = |m 〉一 M 
is convex. Show that E is not necessarily a subspace of *K. 


16-7, Show that for any hermitian operator H, we have 

A-{Hx\y) = {H(x + y)\x + y) - (H(x -y)\x -y) 

+ i[(H(x^iy)\x + iy) - {H(x - iy)\x - iy)l 

Now let |x> = X \z) and \y) = \Hz) /K where k = ("Hzll/llzll) 1 / 2 , and show that 

\\Hz\\ 2 = (Hx\y) < M\\z\\ IIHzlI ， 

where M = max{| (Hz|^) |/| ||^|| 2 }. Now conclude that ||H|| < M. 

16.8. Show that the two kernels K\(x, t) — and ^ 2 (^, 0 = sinxt, where 

the first one acts on £ 2 (— 00 , 00 ) and the second one on £ ； 2 (0, 00 )， have the two 
eigenfunctions 


e iat , a € R, 


and 




a 2 + P ， 

respectively, corresponding to the two eigenvalues 


a > 0, 


入 


2 


1 + Of 2 ， 


a eR, 


and 


k 


16.9. Derive Equation (16.9). Hint: Multiply R>,(T) by 1 = R M (T)(T — fii) and 
»V(T)by1 入 (T)(T- 入 1)_ 

16.10. Finish Example 16.8.2 by showing that Pj?k = 0 forfc ^ j. 


16*11. Show that D" =? (X — kj) n R), (A)^ 

and the technique used in Example 16.8.2. 


入 . Hint: Use mathematical induction 


16.12. (a) Take the inner product of \u) ― (A _ 入 1) |v> with |i;) and show that 
for a hermitian A, Im (u| w> = — (Im k) 1 … || 2 . Now use the Schwarz inequality to 
obtain 


IIHI < 


Nil 

jlmXf 


^ 剛 1 … ^ 


(b) Use this result to show that 




Re (入 一 A,j) \ 
Im(X - kj)) 


[| W ||=(l + |cot0|)||«||, 






16.9 PROBLEMS 487 


where 0 is the angle that X — Xj makes with the real axis and X is chosen to have 
an imaginary part. From this result conclude that R 入 (A) has a simple pole when A 
is hermitian. 

16.13. (a) Show that when A is hermitian, [H 入 (A )]， = (A). 

(b) Write k — kj = rje l6 in the definition of Pj in Equation (16.13). Take the 
hermitian conjugate of both sides and use (a) to show that Pj is hermitian. Hint: 
You will have to change the variable of integration a number of times. 

Additional Readings 

1. DeVito, C. Functional Analysis and Linear Operator Theory, Addison- 
Wesley, 1990. Our treatment of compact operators follows this reference’s 
discussion. 

2. Glimm, J. and Jaffe ， A. Quantum Physics, 2nd ed” Springer-Verlag ， 1987. 
One of the most mathematical treatments of the subject, and therefore a good 
introduction to operator theory (see the appendix to Parti). 

3. Reed, M. and Simon，B • Fourier Analysis^ Self-Adjointness, Academic Press, 
1980. 

4. Richtmyer, R. Principles of Advanced Mathematical Physics ， Springer- 
Verlag ， 1978, Discusses resolvents in detail. 

5. Zeidler, E. Applied Functional Analysis, Springer-Verlag, 1995. 




17 


Integral Equations 


The beginning of Chapter 16 showed that to solve a vector-operator equation one 
transforms it into an equation involving a sum over a discrete index [the matrix 
equation of Equation (16.1)], or an equation involving an integral over a continuous 
index [Equation (16.2)]. The latter is called an integral equation, which we shall 
investigate here using the machinery of Chapter 16. 

17.1 Classification 

Volterra and Integral equations can be divided into two major groups. Those that have a variable 
Fredholm equations limit of integration are called Volterra equations; those that have constant limits 
of first and second 0 f integration are called Fredholm equations. If the unknown function appears 

only inside the integral, the integral equation is said to be of the first kind. Integral 
equations having the unknown function outside the integral as well as inside are 
said to be of the second kind. The four kinds of equations can be written as 
follows. 

Volterra equation of the 1st kind, 
Fredholm equation of the 1st kind, 
Volterra equation of the 2nd kind, 
Fredholm equation of the 2nd kind. 

In all these equations, K(x, t) is called the kernel of the integral equation. 








17-1 CLASSIFICATION 489 


characteristic value 
of an integral 
equation 


In the theoiy of integral equations of the second kind, one usually multiplies 
the integral by a nonzero complex number X. Thus, the Fredholm equation of the 
second kind becomes 

u{x) = u(jc) + X f K(x, t)u(t) dt, (17.1) 

Ja 

and for the Volterra equation of the second kind one obtains 

u(x) = v(x) -\-k f K(x, t)u(f) dt. (17.2) 

Ja 

A X that satisfies (17.2) with i;(x) = 0 is called a characteristic value of the 
integral equation. In the abstract operator language both equations are written as 

|«> = |i;)+XK|w) (K - k~ x ) \u) =-X- 1 \v). (17.3) 

Thus 入 is a characteristic value for (17.1) if and only if 入 — 1 is an eigenvalue of 
K. Recall that when the interval of integration (a, b) is finite, K(x, t) is called a 
Hilbert-Schmidt kernel. Example 16.5.10 showed that K is a compact operator, 
and by Theorem 16.6.9, the eigenvalues of K either forma finite set or a sequence 
that converges to zero. 

17.1.1. Theorem, The characteristic values of a Fredholm equation of the second 
kind either form a finite set or a sequence of complex numbers increasing beyond 
limit in absolute value. 


Our main task in this chapter is to study methods of solving integral equations 
of the second kind. We treat the Volterra equation first because it is easier to solve. 
Let us introduce the notation 

K[u](x) = f K{x,t)u(t) dt and K n [u]{x) = K[K n -\uW{x) 

Ja (17.4) 

whereby K[u] denotes a function whose value at x is given by the integral on the 
RHS of the first equation in (17.4). One can show with little difficulty that the 
associated operator K is compact. Let M = maxflAT^, 01 \ ci <t < x <b] and 
note that 

K(x,t)u(t)dt <|X||M| \\u\\oo(x -a), 

where ||m||oo = max{|M(j：)| \ x e (cl, b)}. 

Using mathematical induction, one can show that (see Problem 17.1) 

\(kK) n [u](x)\ < (17.5) 


\^K[u](x)\ 



490 17, INTEGRAL EQUATIONS 


Since b > x,wq can replace x with b and still satisfy the inequality. Then the 
inequality of Equation (17.5) will hold for all 文 ， and we can write the equation as 
an operator norm inequality: ||( 久 K 广 || 5 |X| w |M| n ||w]|oo( & — a) n /n\. Therefore, 


OQ 




R=0 


«=0 n =0 


n\ 


e 


M\X\(b~a) 


and the series 入 K ) H converges for all X. In fact, a direct calculation shows 

that the series converges to the inverse of 1 — XK. Thus, the latter is invertible and 
the spectrum of K has no nonzero points. We have just shown the following. 


Volterra equation of 
the second kind has 
a unique solution and 
no nonzero 
characteristic value 


17.1*2. Theorem. The Volterra equation of the second kind has no nonzero char¬ 
acteristic value. In particular，the operator 1 — A,K is invertible, and the Volterra 
equation of the second kind always has a unique solution given by the conver¬ 
gent infinite series u(x) = YlJLo ^ fa ( 又， 0^(0 ^ ^here K^(x, t) is defined 
inductively in Equation (17.4). 


Vito Volterra (1860-1940) was only 11 when he became interested 
in mathematics while reading Legendre’s Geometry. At the age of 13 * 
he began to study the three body problem and made some progress. 

His family were extremely poor (his father had died when Vito 
was two years old) but after attending lectures at Florence he was 
able to proceed to Pisa in 1878. At Pisa he studied under Betti, grad¬ 
uating as a doctor of physics in 1882. His thesis on hydrodynamics 
included some results of Stokes, discovered later but independently 
by Volterra. 

He became Professor of Mechanics at Pisa in 1883, and upon 
Betti’s death, he occupied the chair of mathematical physics. Af¬ 
ter spending some time at Turin as the chair of mechanics, he was awarded the chair of 
mathematical physics at the University of Rome in 1900. 

Volterra conceived the idea of a theory of functions that depend on a continuous set of 
values of another function in 1883. Hadamard was later to introduce the word “functional，” 
which replaced Volterra’s original terminology. In 1890 Volterra used his functional calculus 
to show that the theory of Hamilton and Jacobi for the integration of the differential equations 
of dynamics could be extended to other problems of mathematical physics. 

His most famous work was done on integral equations. He began this study in 1884, and 
in 1896 he published several papers on what is now called the Volterra integral equation. He 
continued to study functional analysis applications to integral equations producing a large 
number of papers on composition and permutable functions. 

During the First World War Volterra joined the Air Force, He made many journeys 
to France and England to promote scientific collaboration. After the war he returned to 
the University of Rome, and his interests moved to mathematical biology. He studied the 
Verhulst equation and the logistic curve. He also wrote on predator-prey equations. 

In 1922 Fascism seized Italy, and Volterra fought against it in the Italian Parliament. 
However, by 1930 the Parliament was abolished, and when Volterra refused to take an oath 




17.1 CLASSIFICATION 491 


Neumann series 
solution 


of allegiance to the Fascist government in 1931， he was forced to leave the University of 
Rome. From the following year he lived mostly abroad, mainly in Paris, but also in Spain 
and other countries. 


17.1.3. Example. Differential equations can be transformed into integral equations. For 
instance, consider the SOLDE 

d^u du f 

— ^ + P\(x)— H- Po(x)u = r(x ) 9 u(a) = c \ 9 u (a) = C 2 . 


dx 


dx 


By integrating the DE once, we obtain 


du 

dx 


-f p\{t)u f (t) dt - f p Q (t)u(t) dt - f r{t) dt-\-C2- 
Ja Ja Ja 


Integrating the first integral by parts gives 


•x 


'X 


u\x) = -P1(X)M ⑻ + / po(t)]u(t)dt^ / r ⑴冶十仍 ⑷^ +c 2 . 


抑） 


抑） 


Integrating once more yields 






•A ： 


u(x) 


p\(t)u{t)dt-)r I f(s)ds-\- I g(s)ds-\-(x-a)[pi(a)ci~\-c 2 \ 




•x 


•太 


p\{t)u{t)dt+ / ds / [p[(t) - po(t)iu(t) dt 


f a 


*JC 


+ / ds I r(t) dt + a)[p\ ⑷ q + C2] + q 

Ja Ja 

f X Ux — t)[p[(t) - po(t)] - Pi(t)}u(t)dt 




(x - t)r(t)dt + (x - a)[pi(a)c\ +q] + q ， 


(17.6) 


where we have used the formula 

rx 


f x ds r f(o dt 

Ja 


(x - t)f(t) dt. 


} a 


which the reader may verify by interchanging the order of integration on the LHS. Equation 
(17.6) is a Volterra equation of the second kind with kernel 

K(x ， t)^(x- t)[p[(t) - po(t)]~ p\(t) 

and v(x) = - t)r(t) dt-V-{x - a)[p\ (a)c\ + C2] + q • M 

We now outline a systematic approach to obtaining the infinite series of Theo¬ 
rem 17.1.2, which also works for the Fredholm equation of the second kind as we 
shall see in the next section. In the latter case, the series is guaranteed to converge 




492 17. INTEGRAL EQUATIONS 


only if I 入 I ||K|| < 1. This approach has the advantage that in each successive step, 
we obtain a better approximation to the solution. Writing the equation as 

\u) — |v) -h A.K |w>, (17.7) 


we can interpret it as follows. The difference between \u) and |u) is 入 K|m>. If 
XK were absent, the two vectors \u) and |i;> would be equal. The effect of 入 K 
is to change \u) in such a way that when the result is added to |u), it gives \u). 
As our initial approximation, therefore, we take \u) to be equal to \v) and write 
|mo)= … 〉， where the index reminds us of the order (in this case zeroth, because 
久 K = 0) of the approximation. To find a better approximation, we always substitute 
the latest approximation for \u) in the RHS of Equation (17.7). At this stage, we 
have \u\) = |u) +A,K|mo) = |u) +XK|v). Still a better approximation is achieved 
if we substitute this expression in (17.7): 

\u2) = |v) + 入 K |mi> = |u) + 入 K(|u> + 入 K |v>) = Iv) + 入 K |i;〉+ X 2 K 2 |v). 


The procedure is now clear. Once \u n ), the nth approximation, is obtained, we can 
get |w n +i〉by substituting in the RHS of (17.7). 

Before continuing, let us write the above equations in integral form. In what 
follows, we shall concentrate on the Fredholm equation. To obtain the result for 
the Volterra equation, one simply replaces b, the upper limit of integration, wither. 
The first approximation can be obtained by substituting v(t) for u(t) on the RHS 
of Equation (17.1). This yields 


u\(x) = u(x) + X f K(x, t)v{t)dt. 

Ja 


Substituting this back in Equation (17.1) gives 


U2(x) = u(jc) + 人 / dsK(x, s)u\(s) 

Ja 

pb pb 

=v(x) I k I dsK(x,s)v(s)k 2 I dt 
Ja Ja 

rb rb 

= v(x) -\-k I dtK(x, t)v(t) + A 2 / dtK 2 (x, t)v(t), 
Ja • Ja 


K(x,s)K(s,t) ds 


v(t) 


where K 2 (x, t) = K(x, s)K(s, t) ds. Similar expressions can be derived for 
M 4 (x), and so forth. The integrals expressing various “powers” of 【 can be 
obtained using Dirac notation and vectors with continuous indices, as discussed 






17.1 CLASSIFICATION 493 


in Chapter 6. Thus, for instance, 
K 3 (x, 0 ^ {^1 f |5i) {^l ds\ 


HS ： 


1^2) tel ds 2 ) Kjr) 


pb pb 

= / ds\ I ds2 {Jt：| K \s\) {si I K \S2) fel K \t) 

Ja Ja 

rb pb 

=I ds\ I ds 2 K(x, si)K(s\,S 2 )K(s 2 , t). 

Ja Ja 

We can always use this technique to convert an equation in kets into an equation 
in functions and integrals. Therefore, we can concentrate on the abstract operator 
equation and its various approximations. 

Continuing to the Kth-order approximation, we easily obtain 

n 

\u n ) = |v)+AK|u> + ■■■+ ^K n |i;>= |v), (17.8) 

/ =0 

whose integral form is 
n rb 

u n (x) = J2 kj / dt, (17.9) 

j=0 J a 

Here K J (x, t) is defined inductively by 

K°(x, t) = (x| K° \t) = {x\ 1 \t) = {^| t) = S(x - t), 

K j (x, t) = (x| KK^ -1 |0 = (x\ K \s) {^1 ds ) K j ~ l \t) 
rb 

=I K(x, s)K^~ l (s, t) ds. 


The limit of ( 尤 ） as n — oo gives 


00 


u{x) = I K^(x, t)v(t) dt. 


(17.10) 


The convergence of this series, called the Neumann series, is always guaranteed 
for the Volterra equation. For the Fredholm equation, we need to impose the extra 
condition |A||[K[| < 1. 

17.1.4. Example. As an example, let us find the solution of a(:c) = 1 + 入 /f u(t)dt, a 
Volterra equation of the second kind. Here, v(x) = 1 and K(x, t) = 1, and it is straightfor¬ 
ward to calculate approximations to u(x): 

wo(*0 = — 1, «i(Ar) = 1 + X f K(x 9 t)uo(t)dt = 1 + 入文， 

JO 

f x X 2 x 2 

uiix) = 1-^x1 K(x t t)ui(t)dt = 1 + k I (1 - = l+kx-h —r—. 

— Jo Jo 2 




494 17. INTEGRAL EQUATIONS 


It is clear that t&e «th term will look like 


入 2^2 

Unix) - \+kx -\- ——— h 


X n x 11 
+ ~nT 


n 

=E 





As « —>■ oo, we obtain u(x) = e^ x , By direct substitution, it is readily checked that this is 
indeed a solution of the original integral equation. ■ 


17.2 Fredholm Integral Equations 

We can use our knowledge of compact operators gained in the previous chapter to 
study Fredholm equations of the second kind. With 入一 0 a complex number, we 
consider the characteristic equation 

(1 — XK) \u) = |u), or u{x) — kK[u](x) — (17.11) 

where all functions are square-integrable on [a, b] 9 and K(x y t), the Hilbert- 
Schmidt kernel, is square-integrable on the rectangle [a, b] x [a, b]. 

Using Proposition 16.2.9, we immediately see that Equation (17.11) has a 
unique solution if [A.|||K|| < 1, and the solution is of the form 

00 

\u) = (1 - AK) _1 \v) = (17.12) 

«=0 

or u(x) = where K n [v](x) is defined as in Equation (17.4) 

except that now 士 replaces x as the upper limit of integration. 

17.2.1. Example. Consider the integral equation 

u{x) — I K(x, t)u(t)dt = x, where K(x, t)= 

Jo 

Here 入 =1; therefore, a Neumann series solution exists if ||K|[ < 1. It is convenient to 
write K in terms of the theta function: 1 

K(x, t) =x9(t-x)^r tO(x - t). (17.13) 

This gives \K{x,t)\^ = x 2 0(t — j:) + t 2 0{x — t) because 0 2 (jc — f) == 9(x — t) and 
0(x — t)B{t — x) = 0. Thus, we have 

p\ fl 

l|K|| 2 = / dx / dt\K( Xi t){ 2 

Jo Jo 

— f dx f x^O{t — x)dt f dx f t^O(x — t) dt 

Jo Jo Jo Jo 

=Cdt f f x 2 dx+ [ l dx[ X t 2 dt= f'dx 

Jo Jo Jo Jo Jo \ 3 / Jo 

1 Recall that the theta function is defined to be 1 if its argument is positive, and 0 if it is negative. 



x if 0 < x < 
t if f < ^ < 1. 



17.2 FREDHOLM INTEGRAL EQUATIONS 


Since this is less than 1, the Neumann series converges, and we have 2 


m ( jc ) = ^ I K) (X ， t)v(t) dt 

/ =0 ^ 



(x ， t)t dt = fj( x )- 


The first few terms are evaluated as follows: 
fo(x) = f K^(x,t)tdt= f S(x, t)tdt = x 

Jo Jo 

fl(x) = I K(x, t)tdt = f [x6{t — x) + tB{x — ty\t dt 

.Jo Jo 

Jx Jo 2 6 

The next tennis trickier than the first two because of the product of the theta functions. 
We first substitute Equation (17.13) in the integral for the second-order term，and simplify 

flM = f K^(x,t)tdt= f tdt f K(x, s)K(s t t) ds 
Jo JO Jo 

: f tdt f — x) + — 5) + — t)]ds 

JO Jo 

=x f tdt f sO(s — x)0(t — s)ds + x f t 2 dt f 9(s — x)0(s ― t)ds 

Jo Jo Jo Jo 

+ f tdt f s 2 0(x — s)G(t — s)ds f t 2 dt f sO(x — s)0(s — t)ds. 

Jo Jo Jo Jo 

It is convenient to switch the order of integration at this point. This is because of the presence 
of0(A ： —s) and 0(s — x), which do not involve t and are best integrated last. Thus, we have 


/2( x ) = x I s6(s — x) ds I tdt-\~x I 6(s — x)ds I dt 
“ JO Js Jo Jo 

+ f s 2 0(x — s)ds [ tdt+f sG(x - s)ds f t 2 dt 
JO Js JO JO 


x I s ds 


身一 ds T^i s2ds 




ii x ~u x + m x * 


Asa test of his/her knowledge of 0-function manipulation, the reader is urged to perfonn 
the integration in reverse order. Adding all the terms, we obtain an approximation for m (x) 
that is valid for 0 < < 1: 

^ /oCO + /lW + /2 ⑻ =^ - + ^o^ 5 . ■ 


2 Note that in this case (Fredholm equation), we can calculate the jth term in isolation. In the Volterra case, it was more natural 
to calculate the solution up 如 a given order. 



496 17. INTEGRAL EQUATIONS 


Fredholm alternative 


We have seen that the Volterra equation of the second kind has a unique so¬ 
lution which can be written as an infinite series (see Theorem 17.1.2). The case 
of the Fredholm equation of the second kind is more complicated because of the 
existence of eigenvalues. The general solution of Equation (17.11) is discussed in 
the following: 

17.2.2. Theorem. (Fredholm Alternative) Let K be a Hilbert-Schmidt operator 
and X a complex number. Then either 

1. X is a regular value of Equation (17.11)~or X~ l is a regular point of the 
operator K in which case the equation has the unique solution \u) — (1 — 
入 K)— 1 \v) t or 

2. X is a characteristic value of Equation (17.11) is an eigenvalue of the 
operator K), in which case the equation has a solution if and only if \v) 
is in the orthogonal complement of the (finite-dimensional) null space of 



Proof, The first partis trivial if we recall that by definition, regular points of K are 
those complex numbers which make the operator K — /x.1 invertible. 

For part (2)，we first show that the null space of 1 — k*K^ is finite-dimensional. 
We note that 1 一久 K is invertible if and only if its adjoint 1 — is invertible, 
and 入 € /?(K) iff k* e Since the spectrum of an operator is composed of 

all points that are not regular, we conclude that k is in the spectrum of K if and 
only if 入 * is in the spectrum of K^. For compact operators, all nonzero points of 
the spectrum are eigenvalues. Therefore, the nonzero points of the spectrum of 
K 卞 ， a compact operator by Theorem 16,5.7, are all eigenvalues of K^, and the null 
space of 1 — A*K*^ is finite-dimensional (Theorem 16.6.2). Next, we note that the 
equation itself requires that |v) be in the range of the operator 1 —入 K ， which, by 
Theorem 16.6.2, is the orthogonal complement of the null space □ 


Erik Ivar Fredholm (1866-1927) was bom in Stockholm, the 
son of a well-to-do merchant family. He received the best edu¬ 
cation possible and soon showed great promise in mathematics, 
leaning especially toward the applied mathematics of practi¬ 
cal mechanics in a year of study at Stockholm’s Polytechnic 
Institute. Fredholm finished his education at the University of 
Uppsala, obtaining his doctorate in 1898. He also studied at 
the University of Stockholm during this same period and even¬ 
tually received an appointment to the faculty there. Fredholm 
remained there the rest of his professional life. 

His first contribution to mathiematics was contained in his 
doctoral thesis, in which he studied a first-order partial differential equation in three vari¬ 
ables, a problem that arises in the deformation of anisotropic media. Several years later 







17.2 FREDHOLM INTEGRAL EQUATIONS 497 


he completed this work by finding the fundamental solution to a general elliptic partial 
differential equation with constant coefficients. 

Fredholm is perhaps best known for his studies of the integral equation that bears his 
name. Such equations occur frequently in physics. Fredholm’s genius led him to note the 
similarity between his equation and a relatively familiar matrix-vector equation, resulting 
in his identification of a quantity that plays the same role in his equation as the determinant 
plays in the matrix-vector equation. He thus obtained a method for determining the existence 
of a solution and later used an analogous expression to derive a solution to his equation 
akin to the Cramer’s rule solution to the matrix-vector equation. He further showed that the 
solution could be expressed as a power series in a complex variable. This latter result was 
considered important enough that Poincare assumed it without proof (in fact he was unable 
to prove it) in a study of related partial differential equations. 

Fredholm then considered the homogeneous form of his equation. He showed that 
under certain conditions, the vector space of solutions is finite-dimensional, David Hilbert 
later extended Fredholm’s work to a complete eigenvalue theory of the Fredholm equation, 
which ultimately led to the discovery of Hilbert spaces. 


17.2.1 Hermitian Kernel 

Of special interest are integral equations in which the kernel is hermitian, which 
occurs exactly when the operator is hermitian. Such a kernel has the property that 3 
(^|K|f)* — (?| K\x) or [K(x, 0]* = For such kernels we can use the 

spectral theorem for compact hermitian operators to find a series solution for the 
integral equation. First we recall that 

N N m k 

k=E^ = E^EI4 /) )(^ ) I ， 

7=1 ；=1 k=l 

where we have used 入 y 1 to denote the eigenvalue of the operator 4 and expanded 
the projection operator in terms of orthonormal basis vectors of the corresponding 
finite-dimensional eigenspace, Recall that N can be infinity. Instead of the double 
sum, we can sum once over all the basis vectors and write K = Y^=\ 入 f 1 \ u n) < M nl. 
Heren counts all the orthonormal eigenvectors of the Hilbert space, and is the 
eigenvalue corresponding to the eigenvector \u n ). Therefore, 入了 1 maybe repeated 
in the sum. The action of K on a vector \u) is given by 

OO 

K\U) ^n\u)\u n ) . (17.14) 

H=1 


3 Since we are dealing mainly with real functions, hermiticity of K implies the symmetry of K, i.e” K{x, t) = K{t,x). 
^kj is the characteristic value of the integral equation, or the inverse of the eigenvalue of the corresponding operator. 








498 17. INTEGRAL EQUATIONS 


Hilbert-Schmidt 

theorem 



If the Hilbert space is H 2 [a, &], we maybe interested in the functional form of this 
equation. We obtain such a form by multiplying both sides by (x|: 


00 


oo 


K[u](x) = (x|K|w) - 入 , 7 1 (x\u n ) = (u n \u) u n (x). 


»: 


That this series converges uniformly in the interval [a, b] is known as the Hilbert- 
Schmidt theorem. 

17.2.3. Example. Let us solve u{x) = x + 入 K(x, t)u(t) dt, where K(x, t) = xt is 
a symmetric (hennitian) kernel, by the Neumann series method. We note that ' / 

•b 


b rb 





t)\ 2 dxdt 


f a Ja 



x^t^dx dt 


、b 


I x 2 dxJ t 厶 dt 


Ja 

>b 、 2 


(J x 2 dx^ = ^(b 3 - a 3 ) 2 , 


or 


|[K|| 


x^dx = — <a 3 ), 


and the Neumann series converges if |A| (b^ — fl 3 ) <3. Assuming that this condition holds, 
we have 


oo 


u(x) = x ^ )J j K^(x, t)t dt. 


The special form of the kernel allows us to calculate K^(x, t) directly: 

、b pb pb 

… I K(x, , 52 ) - ■ • K(sj-i,t)ds\ds2'- dsj^i 





r a oa Ja 
、b rb rb 


n t? fD 

… j xsjs2--Sj_ 1 tds\ds2''- dsj-i 


>b \ i _1 _ 

xt [ s 2 ds) = j:/||K||* , ' _1 . 


It follows that K』dt = «x||K||』— 1 } (办 3 — a^) = x||K[|A Substituting this in the 
expression for yields 


oo 


00 


U{x) + ^ j x\\K\\j =x+ xXIIKII Y, ^ _1 l|K|K _1 


X 




X 


3x 


1||K|| 3 - k(b 3 - a 3 ) 


Because of the simplicity of the kernel, we can solve the integral equation exactly. First 
we write 

pb rb 

u(x) =x X I xtu(t) dt = x kx I tu(t)dt s x(l H-1A), (17.15) 

Ja Ja 







17.2 FREDHOLM INTEGRAL EQUATIONS 499 



where A = tu(t) dt 


.Multiplying both sides by x and integrating, we obtain 


A = f b xu(x)dx = (l-\-XA) f b x 2 dx = (1 + kA)\\K\\ A = 
Ja Ja 


l|K|| 


Substituting A in Equation (17.15) gives 


w(x) 



JC 

1 一寧 II 


This solution is the same as the first one we obtained. However, no series was involved here, 
and therefore no assumption is necessary concerning \X\ ||K||. _ 


If one can calculate the eigenvectors \u n ) and the eigenvalues A" 1 , then one 
can obtain a solution for the integral equation in terms of these eigenfunctions as 
follows: Substitute (17.14) in the Fredholm equation [Equation (17.3)] to get 


oo 

l«> = ⑻ + 入 (u n \u) \u n ). (17.16) 

n=l 

Multiply both sides by {u m \: 

00 

( 认 m I w) = {Ufn I v) H - A, X n ^ {u n \u) {u m \u n ) = {Um \ V》 + 入入班丄（鉍所 I “》， 

(17.17) 


(17.18) 


(17.19) 

In case k = k m for some m, the Fredholm alternative (Theorem 17.2.2) says 
that we will have a solution only if |u> is in the orthogonal complement of the null 
space of 5 1 — Moreover, Equation (17.17) shows that (u m \u) 7 the expan¬ 
sion coefficients of the basis vectors of the eigenspace M m , cannot be specified. 




n: 


z ^mn 


or, if 入 is not one of the eigenvalues. 


- {u m \u) = {u m \v) => (u m \u) = \ :〉 


乂 

Substituting this in Equation (17.16) gives 

l «) 二卜> + 人 D 7^-7 | mb >» 

人 n — 人 


入 m 一入 


K=1 


and in the functional form, 

u{x) — u(x) + X (’I u n (x) 9 入 # V/x. 

1 人 w — 久 


5 Remember that K is hermitian; therefore, k m is real. 



INTEGRAL EQUATIONS 


However, Equation (17.17) does determine the rest of the coefficients as before. 
In this case, the solution can be written as 


OO 


M = \V) + XI I \ u n) i 

n=l 人《 一人 
n^m 


(17.20) 


jt=i 


where r is the (finite) dimension of M m , A ： labels the orthonormal basis {|m^>} of 
M m , and {cjt}^ =1 are arbitrary constants. In functional form, this equation becomes 


u(x) = v(x) +^2c k i4lp(x) (x). 

n^m 


(17.21) 


17.2.4. Example. We now give an example of the application of Equation (17.19). We 
want to solve u(x) = 3/lj K(x, t)u(t) dt + x 1 where 


K{xj) = ^ 

Jt=0 


U k (x)Uk(t) 
~2^ ~ 


u k( x ) = J 2 ， 


and Pfc(x) is a Legendre polynomial. 

We first note that {u^} is an orthonormal set of functions, that K(x,t) is real and 
symmetric (therefore, hermitian), and that 


dt dx\K(x,t) 


i 2 =£ 


dt 


1 00 uic(x)u k (t) ui(x)ui(t) 


E 

- 1 fe，/=0 


妒 /2 


2" 2 

1 


£ ^72^772 J ^ u k(x)ui(x) dx j ^u k (t)ui(t) dt 


k,l=0 


00 



k=0 


Thus, K(x, t) is a Hilbert — Schmidt kernel. 
Now note that 


1 00 

； i 〜 = E 


众 =0 


¥ 


=hi 

2 <00. 


: hi 


• 1 


00 


K(x, ^u^it) dt 


^5 


ui(x)ui(t) 


2 "2 

00 ui(x) 


dt 


E 

/ =0 


2^2 / 


Ui(t)u k {t) dt 


2V2 


Uk(x). 


z hi 


This shows that % is an eigenfunction ofK{x, t) with eigenvalue l/2^ 2 . Since 3^1 /2" 2 
for any integer k, we can use Equation (17.19) to write 


u ( x ) = a ： 2 +3 


^ /^(小 2 心 


k=0 


2^2- 


MjfcW. 


17.2 FREDHOLM INTEGRAL EQUATIONS 501 


degenerate or 
separable kernel 


But U]c(s)s 2 ds = 0 for > 3. For /: < 2, we use the first three Legendre polynomials 
to get 


1 2 芯 

uo(s)s ds - 


u\(s)s z ds = 0, 




This gives u(x) = j The reader is urged to substitute this solution in the original 

integral equation and verify that it works. H 


17.2.2 Degenerate Kernels 

The preceding example involves the simplest kind of degenerate, or separable ， 
kernels. A kernel is called degenerate, or separable, if it can be written as a finite 
sum of products of functions of one variable: 


n 


K{x,t) = ^^( 文 ) 朽 (0, 


(17.22) 


where and ij/j are assumed to be square-integrable. Substituting (17.22) in the 
Fredholm integral equation of the second kind, we obtain 


n nb 

u{x) — xy^J)j(x) I dt — v{x). 

7=1 


If we define — dt, the preceding equation becomes 


n 


u{x) — — w(jc). 


(17.23) 


Multiply this equation by ir*(x) and integrate over x to get 


fM 


入 ^ fJijAij = W for z. = 1， 2,… ， ft ， 


(17.24) 


where Aij = and v* = dt. With/x；, v i9 and as 

components of column vectors u, v, and a matrix A, we can write the above linear 
system of equations as 


u —入 Au = v, 


or 


(1 —入 A)u = v. 


(17.25) 


We can now determine the ^ by solving the system of linear equations given 
by (17.24). Once the /i,- are determined, Equation (17.23) gives m(a:). Thus, for a 
degenerate kernel the Fredholm problem reduces to a system of linear equations. 



502 17. INTEGRAL EQUATIONS 


17.2.5. Example, As a concrete example of an integral equation with degenerate kernel, 
we solve u{x)—X dt = x for two different values of k. The kernel, K (x ， t)= 

1 + xt, is separable with (j >\ (^:) = 1, = 1, = x f and ^(0 = ^ This gives the 

matrix 


A 


1 2 

1 1 
2 3 - 


For convenience, we define the matrix B = 1 — 入 A. 

(a) First assume that X = 1. In that case B has a nonzero determinant. Thus, B -1 exists, and 
can be calculated to be 


B' 


3 

■2 0 


With 

vi = 


f i(f^(t)v(t)dt = j tdt = ^ and V 2 = J dt 


t 2 dt 


5 


we obtain 




|-2 
2 0 



3^ 


Q- 


Equation (17.23) then gives u{x) = 4- = —2. 

(b) Now, for the purpose of illustrating the other alternative of Theorem 17.2.2, let us take 
入 = 8 + 2^/\3. Then 


B 


/7 + 2VT3 4 + >/l3 \ 

- 入 A= —(4 + vT5 (5 +2713)/3 ； j 


and det B = 0. This shows that 8 + 2v^l3 is a characteristic value of the equation. We thus 
have a solution only if i;(x) = ^ is orthogonal to the null space of 1 — 久 * 八卞 =B 卞 • To 
determine a basis for this null space, we have to find vectors \z) such that [z) = 0. Since 
X is real, and B is real and symmetric, = B, and we must solve 


'7 + 2VI3 4 + x/13 
,4-hVl3 (5 + 2VT3)/3 


)©=o. 


The solution to this equation is a multiple of \z) = (). If the integral equation is to 
have a solution, the column vector v (whose corresponding ket we denote by |i;)) must be 
orthogonal to \z). But 


(z\v) = (3 -2 


Ml) 


一 o. 


Therefore, the integral equation has no solution. 


m 




17.2 FREDHOLM INTEGRAL EQUATIONS 503 


The reader may feel uneasy that the functions (j>j{x) and appearing in 
a degenerate kernel are arbitrary to within a multiplicative function. After ail, we 
can multiply <f)j (x) by a nonzero function, and divide ^*(0 by the same function, 
and get the same kernel. Such a change clearly alters the matrices A and B and 
therefore seems likely to change the solution, That this is not the case is 
demonstrated in Problem 17.2. In fact, it can be shown quite generally that the 
transformations described above do not change the solution. 

As the alert reader may have noticed, we have been avoiding the problem 
of solving the eigenvalue (characteristic) problem for integral operators. Such 
a problem is nontrivial, and the analogue of the finite-dimensional case, where 
one works with determinants and characteristic polynomials, does not exist. An 
exception is a degenerate hermitian 6 kernel, i.e., a kernel of the form K(x, t )= 
ZHi hi{x)hl{t). Substituting this in the characteristic-value equation 


u{x) 



K{x, t)u(t) dt, 




we obtain u(x) = X fa dt. Defining^ ^ dt 

and substituting it back in the equation gives 


u{x) = 


(17.26) 


Multiplying this equation by k~ l h^(x) and integrating over x yields 


itf 

人 1 从 k = 〉: 


hUx)hi(x) dx 




fM 三 Yl mki 叫 , 


This is an eigenvalue equation for the hermitian n x n matrix M with elements 
niij, which, by spectral theorem for hermitian operators, can be solved. In fact, the 
matrix need not be hermitian; as long as it is normal ，the eigenvalue problem can 
be solved. Once the eigenvectors and the eigenvalues are found, we can substitute 
them in Equation (17.26) and obtain u(x). We expect to find a finite number of 
eigenfunctions and eigenvalues. Our analysis of compact operators included such a 
case. That analysis also showed that the entire (infinite-dimensional) Hilbert space 
could be written as the direct sum of eigenspaces that are finite-dimensional for 
nonzero eigenvalues. Therefore, we expect the eigenspace corresponding to the 
zero eigenvalue (or infinite characteristic value) to be infinite-dimensional. The 
following example illustrates these points. 

17.2.6. Example. Let us find the nonzero characteristic values and corresponding eigen¬ 
functions of the kernel jKXx ， f) = 1 + sin(x + ?) for — ?r < x,t < tt. 


6 Actually, the problem of a degenerate kernel that leads to a normal matrix, as described below, can also be solved. 


504 17. INTEGRAL EQUATIONS 


We are seeking functions u and scalars k satisfying «(•*:) = XK[u](x), or 


*jr 


tt (: t) = X I [1 + sin(;c + r)]«(0 dt. 


Expanding sin(jc + 1), we obtain 

严 . 

u[x) — k I [1 + sirucosf + cosxsiiU]M ⑴ A, 

J—7T 


or 


X~ L u(x) = /xj + /^2 sinx + 从 3 cos^, 


(17.27) 


(17.28) 


where fi\ = u{t) dt, ^2 = w(0 costdt, and 卩 3 = u(t) sin t dt. Integrate 

both sides of Equation (17.28) with respect to x from — jt to ?r to obtain 久一 1 Aq = lirfii ， 
Similarly, multiplying by sin a: and cos x and integrating yields 


A. - V 2 


A - >3 =7TM2_ 


(17.29) 


If Hi / 0, we get 入一 1 = 2jt ， which, when substituted in (17.29), yields M 2 — ^3 = 0. We 
thus have, as a first solution, 久 ^ 1 ― 27r and \u\) = a( 0 ), where a is an arbitrary constant. 

Equation (17.28) now gives 入 = Al> or «i(x) = c\, where c\ is an arbitrary 
constant to be determined. 

On the other hand, = Oif 入一 1 _ 2tt. Then Equation (17.29) yields 入 = ±丌 and 
^2 = 土 ^ 3 . For 久一 1 = = 7t, Equation (17.28) gives 

u(x) ^ m_|_(x) = c+(sin;c + cosx), 

and for 久 _1 三入二 1 = — jt ，it yields u(x) = u-(x) — c_(sin^ — cosx), where c± are 
arbitrary constants to be determined by normalization of eigenfunctions. The normalized 
eigenfunctions are 


«1 


1 




a 土⑻ 


^/2tt 


(sinjc 土 cosx). 


are 


Direct substitution in the original integral equation easily verifies that u\,u^., and 1 
eigenfunctions of the integral equation with the eigenvalues calculated above. 

Let us now consider the zero eigenvalue (or infinite characteristic value). Divide both 
sides of Equation (17.27) by X and take the limit of X 00 . Then the integral equation 
becomes 


•JT 


[1 + sinxcosr + cosa: sinr]«( 0 ^ = 0 . 


-JT 


The solutions u(t) to this equation would span the eigenspace corresponding to the zero 
eigenvalue, or infinite characteristic value. We pointed out above that this eigenspace is 
expected to be infinite-dimensional. This expectation is borne out once we note that all 
functions of the form sm nt or cos nf with n >2 make the above integral zero; and there 
are infinitely many such functions. M 






17.3 PROBLEMS 505 


17.3 Problems 

17,1_ Use mathematical induction to derive Equation (17.5). 

17.2. Repeat part (a) of Example 17.2.5 using 

(plM - f\(t) = 2, (p 2 (x) =x, = t 

so that we still have K(x, t) = 少 i(0 + 少 2 ( 幻 於 2 W. 

17.3. Use the spectral theorem for compact hermitian operators to show that if the 
kernel of a Hilbert — Schmidt operator has a finite number of nonzero eigenvalues, 
then the kernel is separable. Hint: See the discussion at the beginning of Section 
17.2.1. 


17.4. Use the method of successive approximations to solve the Volterra equation 
u(x) — X u{t)dt. Then derive a DE equivalent to the Volterra equation (make 
sure to include the initial condition), and solve it. 

17.5, Regard the Fourier transform. 


►OO 


F[/](^)= 


\Z2jt 


e ixy f{y)dy 


■00 


as an integral operator. 

⑻ Show that F 2 [f](x) = 

(b) Deduce, therefore, that the only eigenvalues of this operator are 入 = 士 1 ， ■ 

(c) Let f(x) be any even function of x. Show that an appropriate choice of a can 
make m = / + an eigenfunction of F. (This shows that the eigenvalues of 
F have infinite multiplicity.) 


17.6. For what values of A does the following integral equation have a solution? 
m(;c) = X I sin(x + t)u[t)dt -\-x, 

Jo 

What is that solution? Redo the problem using a Neumann series expansion. Under 
what condition is the series convergent? 

17.7. It is possible to multiply the functions (pj(x) by yj(x) and fj(t) by 1/yjit) 

and still get the same degenerate kernel, K(x, t) = 0jU)^y(O* Show that 

such arbitrariness, although affecting the matrices A and B, does not change the 
solution of the Fredholm problem 


m(x ) —入 



K(x, t)u(t)dt = 


fix). 


17.8. Show，by direct substitution, that the solution found in Example 17.2.4 does 
satisfy its integral equation. 



506 17. INTEGRAL EQUATIONS 


17.9. Solve u{x) — ^ +t)u{t) dt -\-x, 

17.10. Solve u(x) = 入义 1 xtu{t) dt -\- x using the Neumann series method. For 
what values of 入 is the series convergent? Now find the eigenvalues and eigen¬ 
functions of the kernel and solve the problem using these eigenvalues and eigen¬ 
functions. 

17.11. Solve u(x) = k f^° K(x, t)u {t)dt -\-x a , where a is any real number except 
a negative integer, and K(x, t) = 厂。 For what values of X does the integral 
equation have a solution? 

17.12. Solve the integral equations 


■7T 


(a) u(x) = e x I xtu(t) dt. 

Jo 

(c) u(x) = x 2 + I xtu(t)dt, 

Jo 


(b) u(x) =A / sin(^ — t)u(t)dt. 

Jo 




id) u{x) = x + I u{f) dt• 

Jo 


17.13. Solve the integral equation w(jc) = x + 入 /J (x + t)tu{t) dt, keeping terms 
up to 入 2 . 


17.14. Solve the integral equation m(x) = dt, assuming 

that / remains finite as x —d=oo. 

17.15. Solve the integral equation u(x) = e~^ -h k u(t) cosx^ dt, assuming 

that / remains finite asx 士 oo. 


Additional Reading 

1. DeVito, C. Functional Analysis and Linear Operator Theory 、 Addison- 
Wesley, 1990. 

2. Jorgen, K. Linear Integral Operators, Pitman, 1982. Translated from its 
original German, this is a thorough (but formal) introduction to integral 
operators and equations. 

3. Zeidler, E. Applied Functional Analysis, Springer-Verlag, 1995. 





18 


Sturm-Liouville Systems: Formalism 


The linear operators discussed in the last two chapters were exclusively integral 
operators. Most applications of physical interest, however, involve differential 
operators (DO). Unfortunately, differential operators are unbounded. We noted that 
complications arise when one abandons the compactness property of the operator, 
e.g., sums turn into integrals and one loses one’s grip over the eigenvalues of 
noncompact operators. The transition to unbounded operators further complicates 
matters. Fortunately, the formalism of one type of DOs that occur most frequently 
in physics can be studied in the context of compact operators. Such a study is our 
aim for this chapter. 

18.1 Unbounded Operators with Compact Resol¬ 
vent 

As was pointed out in Example 16.2.7, the derivative operator cannot be defined 
for all functions in Jd 2 (a, b). This motivates the following: 

18.1.1. Definition. Let D be a linear manifold 1 in the Hilbert space *K. A linear 
domain of a linear map T : D ^ will be called a linear operator in 2 %, D is called the domain 

operator ofl and often denoted by 2)(T). 

18.1.2. Example. The domain of the derivative operator D, as an operator on b), 

cannot be the entire space. On the other hand, D is defined on the linear manifold M in 
£ 2 (fl, b) spanned by [ e i2n7tx / L ] with L = — a. As we saw in Chapter 8, JVt is dense 


1 A linear manifold of an infinite-dimensionai normed vector space is a proper subset that is a vector space in its own right, 

not necessarily closed. 

2 As opposed to on %, 



508 18. STURM-LIOUVILLE SYSTEMS: FORMALISM 


(see Definition 16.4.5 and the discussion following it) in £ 2 (a ， b). This is the essence of 
Fourier series: That every function in Z 2 (a t b) can be expanded in (i.e., approximated by) 
a Fourier series. It turns out that many unbounded operators on a Hilbert space share the 
same property, namely that their domains are dense in the Hilbert space. 

Another important property of Fourier expansion is the fact that if the function is 
differentiable, then one can differentiate both sides, i.e., one can differentiate a Fourier 
expansion term by term if such an operation makes sense for the original function. Define 
the sequence [f m ] by 

nt 1 fb 

frn(x) = ^2 a n e i2nnx ’ L , a n - — = / f{x)e~ llnnx ^ L dx. 
n=-m v L Ja 

Then we can state the property above as follows: Suppose {f m } isinM. If lim/ m ^ / and 
lim = g, then f f = g and / € H. Many unbounded operators share this property. M 

18.1.3. Definition. Let V be a linear manifold in the Hilbert space Ji. Let T : 
V be a linear operator in Suppose that for any sequence {|m«)} in T) f 
both {\u n )} and {T|m w >} converge in !K f i.e., 

lim |« n ) = |m) and limT |w tt ) = |u). 

closed 叩 erator We say that T is closed if\v) e V and T \u) = \v). 

Notice that we cannot demand that |v) be in D fora general operator. This, as 
we saw in the preceding example, will not be appropriate for unbounded operators. 

The restriction of the domain of an unbounded operator is necessitated by the 
fact that the action of the operator on a vector in the Hilbert space in general takes 
that vector out of the space. The following theorem (see [DeVi 90, pp. 251-252] 
for a proof) shows why this is necessary: 

18.1.4. Theorem. A closed linear operator in that is defined at every point of 
Ji (so that T) =： J{) is bounded. 


difference between 
hermitian and 
self-adjoint operators 


operators with 
compact resolvent 


Thus, if we are interested in unbounded operators (for instance, differential 
operators), we have to restrict their domains. In particular, we have to accept the 
possibility of an operator whose adjoint has a different domain. 3 

18.1.5* Definition. Let T be a linear operator in !H. We shall say that! is hermitian 
ifl^ is an extension ofT f i.e., D(T) C "1)(1^) and \u) =1 \u) for all\u) e D(T). 
T is called self-adjoint if 卞 ). 

As we shall see shortly, certain types of Sturm-Liouville operators, although 
unbounded, lend themselves to a study within the context of compact operators. 

18.1.6. Definition. A hermitian linear operator T in a Hilbert space is said to 
have a compact resolvent if there is a fjL e p(T)for which the resolvent R^(T) is 
compact. 


3 This subtle difference between hermitian and self-adjoint is stated here merely to warn the reader and will be confined to the 
present discussion. The two qualifiers will be (ab)used interchangeably in the rest of the book. 







18.1 UNBOUNDED OPERATORS WITH COMPACT RESOLVENT 


An immediate consequence of this definition is that R 入 (T) is compact for all 
k € p(T). To see this, note that R^(J) is bounded by Definition 16.3.1. Now use 
Equation (16.9) and write 

R 入 ⑺- [1 + (^ - m)Ra(T)]R^(T). 

The RHS is a product of a bounded 4 and a compact operator, and therefore must be 
compact. The compactness of the resolvent characterizes its spectrum by Theorem 
16.7.8. As the following theorem shows, this in turn characterizes the spectrum of 
the operators with compact resolvent. 

18.1.7. Theorem. Let T be an operator with compact resolvent R 人 (T) where k e 
p(T). Then 0 ^ fi e p(R^.(T)) if and only if (X + l/fx) e p(T). Similarly ， 
fj, ^ 0 is an eigem>alue o/ Rx(T) if and only if {k 4 - l/fi) is an eigenvalue of 1. 
Furthermore, the eigenvectors o/R>.(T) corresponding to p coincide with those of 
T corresponding to (X 1/^0. 

Proof. The proof consists of a series of two-sided implications involving defini¬ 
tions. We give the proof of the first part, the second part being very similar: 


fi e /?(Ra(T)) iff Rx(T) -/xl is invertible. 

R 入 ( T ) — is invertible iff (T — A1) _1 - //I is invertible. 

(T — 人 1) 一 1 — /xl is invertible iff 1 — 〆(!" 一 入 1) is invertible. 

1 — u(T —入 1) is invertible iff 丄 1 — T + 入 1 is invertible. 

(丄 + 人 ) 1 — T is invertible iff (士 + A) e p(T). 

Comparing the LHS of the first line with the RHS of the last line, we obtain the 
first part of the theorem. □ 

A consequence of this theorem is that the eigenspaces of an (unbounded) 
operator with compact resolvent are finite-dimensional, i.e” such an operator has 
only finitely many eigenvectors corresponding to each of its eigenvalues. Moreover, 
arranging the eigenvalues fi n of the resolvent in decreasing order (as done in 
Theorem 16.7.8), we conclude that the eigenvalues of T can be arranged in a 
sequence in increasing order of their absolute values and the limit of this sequence 
is infinity. 

18,1.8. Example. Consider the operator T in £ 2 (0,1) defined by 5 Jf = -/〃 having 
the domain H(T) = {f e £ 2 (0, e £ 2 (0,1) ， /(0) = /(l) = 0}. The reader may 


4 The sum of two bounded operators is bounded. 

5 We shall depart from our convention here and shall not use the Dirac bar-ket notation although the use of abstract operators 
;ourages their use. The reason is that in this example, we are dealing with functions, and it is more convenient to undress the 


encourages their use. The reason is that in this example, we 
functions from their Dirac clothing. 



510 18. STURW1-UOUVILLE SYSTEMS: FORMALISM 


Sturm-Llouvilie 

operators 


separated boundary 
conditions 


regular 

Sturm-Liouville 

systems 


check that zero is not an eigenvalue of T. Therefore, we may choose Ro(T) = T 一 1 . We 
shall study a systematic way of finding inverses of some specific differential operators in 
the upcoming chapters on Green’s functions. At this point, suffice it to say that T— 1 can be 
written as a Hilbert-Schmidt integral operator with kernel 


K(x t t)= 


x(l — 0 
(1 -砟 


if 0 < x < r < 1, 
if 0 S f ；£ x ：£ 1. 


Thus, if T/ = g, i.e., if f n = -g f then T _1 g = /, or / = K[g] t i.e., 

f(x) = K[g](x) = f K(x ， t)g(t)dt= f (l-x)tg(t)dt-\- f (1 -x)tg(t)dt. 

JO Jo Jx 

It is readily verified that ^T[g](0)= 【 [g](l) = 0 and f r {x) = 疋 [g]"Ct) = —g. 

We can now use Theorem 18.1.7 with 入 = 0 to find all the eigenvalues of T: fi n is an 
eigenvalue of T if and only if l//x« is an eigenvalue of T -1 . These eigenvalues should have 
finite-dimensional eigenspaces, and we should be able to arrange them in increasing order 
of magnitude without bound. To verify this, we solve /" = -/x/, whose solutions are 
fi n = and fn(x) = sinnjri. Note that there is only one eigenfunction corresponding 
to each eigenvalue. Therefore, the eigenspaces are finite- (one-) dimensional. 團 


The example above is a special case of a large class ofDOs occurring in math¬ 
ematical physics. Recall from Theorem 13.5.4 that all linear second-order differ¬ 
ential equations can be made self-adjoint. Moreover, Example 13.4.12 showed 
that any SOLDE can be transformed into a form in which the first-derivative term 
is absent. By dividing the DE by the coefficient of the second-derivative term if 
necessary, the study of the most general second-order linear differential operators 
boils down to that of the so-called Sturm-Liouville (S-L) operators 


Lx = ^2 ~ ( 18 . 1 ) 

which are assumed to be self-adjoint. Differential operators are necessarily ac¬ 
companied by boundary conditions that specify their domains. So, to be complete, 
let us assume that the DO in Equation (18.1) acts on the subset of L 2 (a, b) con¬ 
sisting of functions u that satisfy the following so-called separated boundary 
conditions: 

a\u(a) + ^\u(a) = 0, 

Gt2u{b) + ^iu\b) = 0, (18.2) 

where a\, o? 2 , Pu and ^2 are real constants with the property that the matrix of 
coefficients has no zero rows. The collection of the DO and the boundary conditions 
above is called a regular Sturm-Liouville system. 

We now show that the DO of a regular Sturm-Liouville system has compact 
resolvent. First observe that by adding au — with a an arbitrary number different 
from all eigenvalues of the DO — to both sides of the eigenvalue equation m"— 




18.1 UNBOUNDED OPERATORS WITH COMPACT RESOLVENT 511 


The last equation follows from the fact that the Wronskian u\U 2 -u 2 u\ is constant 
for a DE of the form m" — qu=0. By substituting u f [ = qu\ and = qu 2 in the 
last equation, we verify that u = K[v] is indeed a solution of the Sturm-Liouville 
system L x u = v. 

Next, we show that the eigensolutions of the S-L system are nondegenerate, 
i.e” the eigenspaces are one-dimensional. Suppose f\ and ,2 are any two eigen¬ 
functions corresponding to the same eigenvalue. Then both must satisfy the same 
DE and the same boundary conditions; in particular, we must have 

(f\ia) /aA = /0\ 

ai/2(a) + = 0 V2(^) 1 • 

If a\ and are not both zero, the Wronskian—the determinant of the matrix 
above — must vanish. Therefore, the two functions must be linearly dependent. 
Finally, recall that a Hilbert space on which a compact operator K is defined 
can be written as a direct sum of the latter’s eigenspaces. More specifically, 
% = ㊉ M/，where each is finite-dimensional for j = 1,2,..., and 

6Although this will change q 一 and the original operator~no information will be lost because the eigenvectors will be the:, 
same and all eigenvalues will be changed by a. 


qu = ku, we can assume 6 that zero is notan eigenvalue of L^. Next, suppose that 
«i(x) and U 2 M are the two linearly independent solutions of the homogeneous 
DE satisfying the first and the second boundary conditions of Equation (18.2), 
respectively. The operator whose kernel is 

_ \-m(x)u 2 (t)/W(a) ]fa<x<t<b, 

™ -ui(t)u 2 (x)/W(a) a <t < x <b t 

in which W is the Wronskian of the solutions, is a Hilbert-Schmidt operator and 
therefore compact. We now show that K(x,t) is the resolvent RoCL^) = LJ 1 = K 
of our DO. To see this, write = v 9 and 

u(x) = K[v](x) = [ X ui(t)v(t) dt - f u 2 (tMt)dt. 

Differentiating this once gives 

u\x) = f dt - f u 2 (t)v(t) dt, 

W(a) J a W ⑷人； 

and a second differentiation yields 


W2 

V). V). 

^s 


\J_ 

a(« 

"«2取 


512 18 . STURM-LIOUVILLE SYSTEMS: FORMALISM 


N can befinite or infinite. If N is finite, then Mo, which can be considered as 
the eigenspace of zero eigenvalue, 7 will be infinite-dimensional. If Mo is finite¬ 
dimensional (or absent), then N must be infinite, and the eigenvectors of K will 
span the entire space, i.e” they will form a complete orthogonal system. We now 
show that this holds for the regular Sturm-Liouville operator. 


Jacques Charles Francois Sturm (1803-1855) made the first 
accurate determination of the velocity of sound in water in 
1826, working with the Swiss engineer Daniel Colladon. He 
became a French citizen in 1833 and worked in Paris at the 
Ecole Polytechnique where he became a professor in 1838. In 
1840 he succeeded Poisson in the chair of mechanics in the 
Faculte des Sciences 、 Paris. 

The problems of determining the eigenvalues and eigen¬ 
functions of an ordinary differential equation with boundary 
conditions and of expanding a given function in terms of an 
infinite series of the eigenfunctions, which date from about 
1750, became more prominent as new coordinate systems were introduced and new classes 
of functions arose as the eigenfunctions of ordinary differential equations. Sturm and his 
friend Joseph Uouville decided to tackle the general problem for any second-order linear 



differential equation. 

Sturm had been working since 1833 on problems of partial differential equations, pri¬ 
marily on the flow of heat in a bar of variable density, and hence was fully aware of the 
eigenvalue and eigenfunction problem. The mathematical ideas he applied to this problem 
are closely related to his investigations of the reality and distribution of the roots of algebraic 
equations. His ideas on differential equations, he says，came from the study of difference 
equations and a passage to the limit. Liouville, informed by Sturm of the problems he was 
working on, took up the same subject. The results of their joint work was published in 
several papers which are quite detailed. 


Suppose that the above Hilbert-Schmidt operator K has a zero eigenvalue. 
Then, there must exists a nonzero function v such that 尤 [u] (: c) = 0, i.e .， 


⑺ 

W(a) 



dt — 


u\{x) 

W(a) 



U2(t)v(t) dt = 0 


(18.4) 


for all x. Differentiate this twice to get 


4(x ) 厂 

WMJa 


dt — 


♦) f b 

W(a)J x 


U 2 {t)v{t) dt + u(x)= 


0. 


Now substitute w^ = qu\ and = qu 2 in this equation and use Equation (18.4) 
to conclude that v = 0. This is impossible because no eigenvector can be zero. 


7 The reader recalls that when K acts on Mo, it yields zero. 








18.2 STURM-LIOUVILLE SYSTEMS AND SOLOES 513 


Theorem for regular 
Sturm-Liouville 
systems 


Liouville substitution 


Hence, zero is not an eigenvalue of K ， i.e” Mq = {0}. Since eigenvectors of 
K = L J 1 coincide with eigenvectors of k ， and eigenvalues of U are the reciprocals 
of the eigenvalues of K, we have the following result. 

18.1.9. Theorem, A regular Sturm-LiouvUle system has a countable number of 
eigenvalues that can be arranged in an increasing sequence that has infinity as 
its limit. The eigenvectors of the Sturm-Liouville operator are nondegenerate and 
constitute a complete orthogonal set. Furthermore, the eigenfunction u n (x) cor¬ 
responding to the eigenvalue has exactly n zeros in its interval of definition. 

The last statement is not a result of operator theory, but can be derived using the 
theory of differential equations. We shall not present the details of its derivation. 
We need to emphasize that the boundary conditions are an integral part of S- 
L systems. Changing the boundary conditions so that, for example, they are no 
longer separated may destroy the regularity of the S-L system. 


18.2 Sturm-Liouville Systems and SOLDEs 


We are now ready to combine our discussion of the preceding section with the 
knowledge gained from our study of differential equations. We saw in Chapter 12 
that the separation of PDEs normally results in expressions of the form 


L[m] + 入 m = 0 ， or 


d^u du 

P 2 ^lx^ + Pl ^~dx^ Po ( X ) M + = 0, 


(18.5) 


where w is a function of a single variable and A. is, a priori, an arbitrary constant. This 
is an eigenvalue equation for the operator L, which is not, in general, self-adjoint. 
If we use Theorem 13.5.4 and multiply (18.5) by 


uj(x)= 


P2(x) 


exp 




piif) 


dt 


it becomes self-adjoint for real 入 ， and can be written as 

「 p(x) 字 "I + [Xw;(a:) — q(x)]u — 0 (18.6) 

with p(x) = w(x)p 2 (x)andq(x) — —po(x)w(x). Equation (18.6) is the standard 
form of the S-L equation. However, it is not in the form studied in the previous 
section. To turn it into that form one changes both the independent and dependent 
variables via the so-called Liouville substitution: 


u(x) = v(t)[p(x)w(x)]~ l/A , 




w(s) 

P(s) 


ds. 


(18.7) 


514 18. STURM-LJOUVJLLE SYSTEMS: FORMALISM 


It is then a matter of chain-rule differentiation to show that Equation (18.6) becomes 

g + [A — Q(t)]v = 0, (18.8) 

where 

Q(t) = + a ⑽ )) 《； ⑽ ))r 1 / 4 g[o^) 1/4 ]. 

Therefore, Theorem 18.1.9 still holds. 


Joseph Llouville (1809-1882) was a highly respected professor 
at the College de France y in Paris, and the founder and editor of 
the Journal des Mathdmatiques Pares etAppliquees, a famous pe¬ 
riodical that played m important role in French mathematical life 
through the latter part of the nineteenth century. His own remark¬ 
able achievements as a creative mathematician have only recently 
received the appreciation they deserve. 

He was the first to solve a boundary value problem by solving 
an equivalent integral equation. His ingenious theory of fractional 
differentiation answered the long-standing question of what rea¬ 
sonable meaning can be assigned to the symbol d n y/dx n when n is not a positive integer. 
He discovered the fundamental result in complex analysis that a bounded entire ftinction 
is necessarily 汪 constant and used it as the basis for his own theory of elliptic functions. 
There is also a well-known Liouville theorem in Hamiltonian mechanics, which states that 
volume integrals are time-invariant in phase space. In collaboration with Sturm, he also 
investigated the eigenvalue problem of second-order differential equations. 

The theory of transcendental numbers is another branch of mathematics that originated 
in Liouville’s work. The irrationality of n saide (the fact that they are not solutions of any 
linear equations) had been proved in the eighteenth century by Lambert and Euler. Ia 1844 
Liouville showed that e is not a root of any quadratic equation with integral coefficients 
as well. This led him to conjecture that e is transcendental, which means that it does not 
satisfy any polynomial equation with integral coefficients. 



18*2.1. Example. The Liouville substitution [Equation (18.7)] transforms the Bessel DE 
(xu f Y + (k 2 x - v 2 /x)u = 0 into 

d 2 v 

^T + 
dt 2 

from which we can obtain an interesting result when v = 士 • In that case we have v-\-k 2 v =0, 
whose solutions are of the form coskt and sinkt. Noting that u{x) = J\n{x), Equation 
(18.7) gives 

T .smkt r " 、 coskt 

hnShi) = A ~^/f or 


k 


2 v 1 — 1/4 


u = 0, 







18.2 STURM-L10UVILLE SYSTEMS AND SOLDES 515 


periodic boundary 
conditions 


and since J\/2( x ) IS analytic at jc = 0, we must have = As'mkt/y/t, which is the 

result obtained in Chapter 14. ® 

The appearance of wis the result of our desire to render the differential operator 
self-adjoint. It also appears in another context. Recall the Lagrange identity fora 
self-adjoint differential operator L: 

uL[v] — vL[u] = -^-{p{x)[u{x)v r {x) — (>)]}• (18.9) 

dx 

If we specialize this identity to the S-L equation of (18.6) with u = u\ correspond¬ 
ing to the eigenvalue 入 i and v = U2 corresponding to the eigenvalue 入 2 , we obtain 
for the LHS 


U\L[U2] — U2i-[U\] = Ui(—X2WU2) + U2(^lWUi)= (入 1 — 入 2)Wn£lW2* 


Integrating both sides of (18.9) then yields 

/*b 

(入 1 — X2) / WU\U2dx = {p(x)[U\(x)U2(x) - U2{x)u r x {x )]\ b a ， 

Ja (18.10) 


A desired property of the solutions of a self-adjoint DE is their orthogonality 
when they belong to different eigenvalues. This property will be satisfied if we 
assume an inner product integral with weight function w(x), and if the RHS of 
Equation (18.10) vanishes. There are various boundary conditions that fulfill the 
latter requirement. For example, u\ and M 2 could satisfy the boundary conditions 
of Equation (18.2). Another set of appropriate boundary conditions (BC) is the 
periodic BC given by 


u{a) = u(b) and u\a) = u f (b). (18.11) 

However, as the following example shows, the latter BCs do not lead to a regular 
S-L system. 

18.2.2. Example, (a) The S-L system consisting of the S-L equation d 2 u/dt 2 -\-a) 2 u = 0 
in the interval [0, T] with the separated BCs h (0) = 0 and u (T) = 0 has the eigenfunctions 

u n (t) = sin —t with n = 1,2, ... and the eigenvalues 入 „ = cyg = {nn!TY with 

ft — 2 ， •. • • 

(b) Let the S-L equation be the same as in part (a) but change the interval to [—+r] and 
the BCs to a periodic one such as w(—T) = u(T) and u\—T) = u f (T). The eigenvalues are 
the same as before, but the eigenfunctions are 1, sm(n7tt/T), and cos(«7Tf / T)，where «is a 
positive integer. Note that there is a degeneracy herein the sense that there are two linearly 
independent eigenfunctions having the same eigenvalue (njr/r) 2 . By Theorem 18.1.9, the 
S-L system is not regular. 

(c) The Bessel equation for a given fixed v 2 is 


X 



where a <x <b y 


516 18. STURM-UOUVILLE SYSTEMS: FORMALISM 


singular S-L systems 


and it can be turned into an S-L system if we multiply it by 


Mx) 


Plix) 
Then we can write 


exp 



P\{t) 

P2(0 





x. 



which is in the form of Equation (18.6) with p = w = x t \ = k\ and q{x) — v^/x. If 
a > 0, we can obtain a regular S-L system by applying appropriate separated BCs. M 


A regular S-L system is too restrictive for applications where either a orb or 
both may be infinite or where either a or b may be a singular point of the S-L 
equation. A singular S-L system is one for which one or more of the following 
conditions hold: 


1. The interval [a, b] stretches to infinity in either or both directions. 

2. Either p or w vanishes at one or both end points a and b. 

3. The function q(x) is not continuous in [a, b], 

4. Any one of the functions p(x), q(x), and w(x) is singular at a or b. 

Even though the conclusions concerning eigenvalues of a regular S-L system 
cannot be generalized to the singular S-L system, the orthogonality of eigenfunc¬ 
tions corresponding to different eigenvalues can, as long as the eigenfunctions are 
square-integrable with weight function w(x): 


18.2.3. Box. The eigenfunctions of a singular S-L system are orthogonal if 
the RHS of (18.10) vanishes. 


18.2.4. Example. Bessel functions / v (x) are entire functions. Thus, they 
integrable in the interval [0, b] for any finite positive b. For fixed v the DE 


-^ 2 % r ^ + (A 2 r 2_ v 2 )M 


0 


square- 


(18.12) 


dr 2 dr 

transforms into the Bessel equation x 2 u ff +xu f -\ - (x^—v 2 )u = 0 if we make the substitution 


kr = x. Thus, the solution of the singular S-L equation (18.12) that is analytic at r = 0 
and corresponds to the eigenvalue k 2, is w^(r) = J v (kr). For two different eigenvalues, 

and 号 ， the eigenfunctions are orthogonal if the boundary term of (18,10) corresponding to 
Equation (18.12) vanishes, that is, if 


{r[J v {hr)J f v {k 2 r) - Mk 2 r)J f v {k l r)]) b Q 





18.3 OTHER PROPERTIES OF STURM-L10UV1LLE SYSTEMS 517 


vanishes, which will occur if and only if 人 (/q 办 ) ■/( (k 2 b)—J v {k 2 b)J f v (k\b) = O.A common 
choice is to take J v (k\b) = 0 = J v (k 2 b), that is, to take both k\b and k 2 b as (different) 
roots of the Bessel function of order v. We thus have Jq rJ v {kir)J v (kjr) dr = 0 If k( and 
kj are different roots of J v (kb) = 0. The Legendre equation 


ax ax 


+ 入 m = 0, 


where — 1 < x < 1 ， 


is already self-adjoint. Thus, = 1 , and p(x) = 1 一 x 2 . The eigenfunctions of this 
singular S-L system [singular because p{\) — p(—1) = 0] are regular at the end points x = 
士 1 and are the Legendre polynomials P n (x) coiTesponding toA. = n(w +1). The boundary 
term of (18.10) clearly vanishes ata = — 1 and b = +1. Since P n (x) are square-integrable 
on [—1, 十 1]， we obtain the familiar orthogonality relation: P n (x)P m (x) dx = 0 if 

m _ n. 

The Hennite DE is 


u 〃一 2xu’ + 入 m = 0. 


(18,13) 


It is transformed into an S-L system if we multiply it by = e~ x . The resulting S-L 
equation is 


_£ 

dx 


「 _ r 2 dui 
£ d^. 


+ 入 〆 a =0. 


(18.14) 


The boundary term corresponding to the two eigenfunctions u\(x) and U 2 (x) having the 
respective eigenvalues and 入 2 — 入 i is 


[e~ x \iii{x)u f 2 {x) - W2W«i WUa- 


This vanishes for arbitrary and U 2 (because they are Hermite polynomials) if a = —00 
andZ? = + 00 . 

The function u is an eigenfunction of (18.14) corresponding to the eigenvalue X if 
and only if it is a solution of (18.13). Solutions of this DE corresponding to 入 = 
are the Hennite polynomials H n (x) discussed in Chapter 7. We can therefore write 
f 亡二 e~ x H n (x)H m (x) dx = 0 if m ^ w. This orthogonality relation was also derived in 
Chapter 7. M 


18.3 Other Properties of Sturm-Liouviile Systems 

The S-L problem is central to the solution of many DEs in mathematical physics. 
In some cases the S-L equation has a direct bearing on the physics. For example, 
the eigenvalue X may correspond to the orbital angular momentum of an electron in 
an atom (see the treatment of spherical harmonics in Chapter 12) or to the energy 
levels of a particle in a potential (see Example 14.5.2). In many cases, then, it 
is worthwhile to gain some knowledge of the behavior of an S-L system in the 
limit of large A — high angular momentum or high energy. Similarly, it is useful to 
understand the behavior of the solutions for large values of their arguments. We 
therefore devote this section to a discussion of the behavior of solutions of an S-L 
system in the limit of large eigenvalues and large independent variable. 




518 18. STURM-UOUVILLE SYSTEMS: FORMALISM 


Priifer substitution 


18.3.1 Asymptotic Behavior for Large Eigenvalues 

We assume that the S-L operator has the form given in Equation (18.1). This can 
always be done for an arbitrary second-order linear DE by multiplying it by a 
proper function (to make it self-adjoint) followed by a Liouville substitution. So, 
consider an S-L systems of the following form: 

u rf + [ 入一 q{xy\u = u -h Q(x)u = 0 where Q = 入一分 （ 18.15) 

with separated BCs of (18.2). Let us assume that Q{x) > 0 for all x G [a, b], that 
is, k > q{x). This is reasonable, since we are interested in very large k. 

The study of the system of (18.15) and (18.2) is simplified if we make the 

Priifer substitution: 


u = RQ~ 1 ^ 4 sin0, u f = RQ 1 ’ 4 cos((>, (18.16) 

where R(x 9 X) and 0 (jc ，入 ） are 入 -dependent functions of^. This substitution trans¬ 
forms the S»L equation of (18.15) into a pair of equations (see Problem 18.3): 


dx 

dR 


y/k — q(x) — 

Rq r 


q 


dx 4[X — q(x)] 


4[k^q(x)} 
cos 20. 


sin 2^, 


(18.17) 


The function 入 ） is assumed to be positive because any negativity of u can 

be transferred to the phase 00, 入 ). Also, R cannot be zero at any point of [a, b] 9 
because both m and w’ would vanish at that point, and, by Lemma 13.3.3, u(x) = 0. 
Equation (18.17) is very useful in discussing the asymptotic behavior of solutions 
of S-L systems both when 入 ^ ► oo and when x ^ oo. Before we discuss such 
asymptotics, we need to make a digression. 

It is often useful to have a notation for the behavior of a function /(jc, X) for 
large X and all values of 义 . If the function remains bounded for all values of x as 
X oo, we write f(x, X) = 0(1). Intuitively, this means that as k gets larger and 
larger, the magnitude of the function /"(jc, 入 ） remains of order 1. In other words, 
for no value of x is lim^oo /(jc, 入） infinite. If X n f(x, X) = 0(1), then we can 
write f(x, X) = 0(1)/X n . This means that as X tends to infinity, f(x, X) goes to 
zero as fast as l/X n does. Sometimes this is written as f(x, X) ™ 0(k~ n ). Some 
properties of 0(1) are as follows: 


1. If a is a finite real number, then 0(1) - \-a = 0(1). 

2. 0(1) + 0(1) = 0(1), and 0(1)0(1) = 0(1). 

3. For finite a and b, 0(1) dx = 0(1). 

4. Ifr and are real numbers with r <s, then 




18.3 OTHER PROPERTIES OF STURM-LIOUVILLE SYSTEMS 519 


5. If g(x) is any bounded function ofx, then a Taylor series expansion yields 

, gM 


[入 + g(x)] r = k r 

= x 1 


入 


l+r 


g(x) ( r(r - 1) ^g(x)y ^ 0(1) 


L 入 


+ 


X 2 

=A/* + rg(x)X r ~ l + (9 ⑴ V - 2 = + O(l)^- 1 

= 0(l)X r . 

Returning to Equation (18.17) and expanding its RHSs using property 5, we 
obtain 




dR 0(1) 


dx 


X 


Taylor series expansion of k) and R(x f X) about x — a then yields 

(p(x,X) = <t>{a,k) + (x -a)VT+ 

R(x\X) = R(a,k) + ^ - 

A 


(18.18) 


for 入 — oo. These results are useful in determining the behavior of X n for large 
n. As an example, we use (18.2) and (18.16) to write 


a^_ _ z/(a) _ R(a 9 入) Q 1 / 4 (a ， 入) cos[<Kfl ， 入)] 
fix u(a) R(a, X)0 -1 / 4 (a, X) sin[0(a, X)] 


Q l/2 (a } X) cot[0(a, 入 )], 


where we have assumed that ^ 0. If P\ = 0, we can take the ratio pi/a\ 9 which 
is finite because at least one of the two constants must be different from zero. Let 
A = —oii/P\ and write cot [ 冷 (a, 入 )] = A/^/X — q(a). Similarly, cot[0(fc ， 入 )] = 
B/^X — q ⑹， where B = —of 2 /fc- Let us concentrate on the nth eigenvalue and 
write 


<KUn) = cor 1 


A 


- q{a) 


沴 ( 办，入 „) = cot 一 1 


A 


\An -q{b) 


For large 入 n the argument of cot 一 1 is small. Therefore, we can expand the RHS in 
a Taylor series about zero: 

i i ^ 7T n 0(1) 

cot 1 £ = cot 1 (0) — € -\ - = -—€-{ - - — H —— -j= 


for e = 0(l)/v^«- It follows that 


It 

(t>(b,x n ) = — + KJT 


0 ( 1 ) 


(18,19) 




520 18. STURM-L10UVILLE SYSTEMS: FORMALISM 


The term nn appears in (18.19) because, by Theorem 18.1.9, the nth eigenfunction 
has n zeros between a and b. Since u = RQ~ 1 ^ sin<^, this means that sin0 must 
go through n zeros as x goes from a to b. Thus, atx = the phase (j) must be rut 
larger than at x = a. 

Substituting ^ in the first equation of (18.18), with 入 — 入 and using 
(18.19), we obtain 


7T 0(1) 

—+ W7T H* 


^ V 


or 


(b — ct)y/x^i = nit + 


0 ( 1 ) 


O0) 


(18.20) 


- 1 !/\ _ 

One consequence of this result is that, limn—oo nk n = {b—d)jn. Thus, = 
C n n ， where lim n -»oo C„ — 7t/(b-a), and Equation (18.20) can be rewritten as 


yfK = 


b — a C n n 


b — a n 


(18.21) 


This equation describes the asymptotic behavior of eigenvalues. The following 
theorem, stated without proof, describes the asymptotic behavior of eigenfunctions. 


asymptotic behavior 
of solutions of large 
order 


18.3.1. Theorem. Let{u n (^)1^0 normalized eigenfunctions of the regular 

S-L system given by Equations (18.15) and (182) with 卢 i 卢 2 一 0_ Then，for n — 
oo, 


U n {x)= 



nn{x — a) 
b — a 


, 0 ( 1 ) 

H - 

n 


18.3.2. Example. Let us derive an asymptotic formula for the Legendre polynomials 
P n W-We first make the Liouville substitution to transform the Legendre DE [(1 —i 2 ) 尸 ’/+ 
n(n + l)P n = 0 into 


(f'V 

dt 2 


+ [ 入 ？ i 一 2(0]w = 0, where X n = n{n + 1). 


(18.22) 


Here p(x) = 1 — jc 2 and w(x) = 1, so t = f x ds/y/l — s 2 = cos -1 x, or x{t) = cost, 
and 


Pn(x(t)) = v(t)[l-x 2 (t)]-^ 4 


Vsin/ 


(18.23) 


In Equation (18.22) 


d 2 


Q(t) = a - 斤 1/4 ㉟ [(i - 义 2 ) 1/4 ] 

1 1 1 


18.3 OTHER PROPERTIES OF STURM-LIOUVILLE SYSTEMS 521 


For large n we can neglect Q(t), make the approximation k n ^ + 士 ) 2 , and write 

u + (w + ^) 2 v = 0, whose general solution is 

v(t) = A cos[(n + \)t + a], - 

where A and a are arbitrary constants. Substituting this solution in (18.23) yields 
P n {cost) = A cos[(« + \)t + a]/Vsin7. To determine a we note that /^(0) — 0 if w 
is odd. Thus, if we let f = tt/ 2, the cosine term vanishes for odd « if a = —tt/4. Thus, the 
general asymptotic formula for Legendre polynomials is 


P n (cos t )= 



18.3.2 Asymptotic Behavior for Large x 

Liouville and Priifer substitutions are useful in investigating the behavior of the 
solutions of S-L systems for large x as well. The general procedure is to transform 
the DE into the form of Equation (18.8) by the Liouville substitution; then make 
the Priifer substitution of (18.16) to obtain two DEs in the form of (18.17). Solving 
Equation (18.17) when x ^ oo determines the behavior of 4> and R and, subse¬ 
quently, of m, the solution. Problem 18.4 illustrates this procedure for the Bessel 
functions. We simply quote the results: 


2 「 / 1\ it v 2 - 1/41 0(1) 

yv(x)= v^ cos r~r + 2)2 + ^r _ J + ^ j 

v , 、、 [T • [ ( 1、 jt v 2 —1/4] 0(1) 

These two relations easily yield the asymptotic expressions for the Hankel func¬ 
tions: 

= / v (x) + iY v (x) 

[ 2 ~ f.r ( , , v 2 -1/4]1 , 0 ( 1 ) 

H^(x) = J v (x)-iY v (x) 

[T f.r / . 1 、 7T , v 2 - 1/411 , 0(1) 

七 exp 卜卜 — + + + W 


If the last term in the exponent — which vanishes as x ^ cx) — is ignored, the 
asymptotic expression for H! 1 ) (x) matches what was obtained in Chapter 15 using 
the method of steepest descent. 




522 18. STURM-LIOUVILLE SYSTEMS: FORMALISM 


18.4 Problems 


18.1. Show that the Liouville substitution transforms regular S-L systems into 
regular S-L systems and separated and periodic BCs into separated and periodic 
BCs, respectively. 

18.2. Letwi(x) and M 2 OO be transformed, respectively into i ； i(0 and V 2 (t) by the 
Liouville substitution. Show that the inner product on [a, b] with weight function 
w(x) is transformed into the inner product on [0, c] with unit weight, where c = 

!a^TP dx - 

18.3. Derive Equation (18.17) from (18.15) using Priifer substitution. 

18.4* (a) Show that the Liouville substitution transforms the Bessel DE into 


d 2 v 





(b) Find the the equations obtained from the Priifer substitution, and show that for 
large x these equations reduce to 


= k 



a \ 0(1) 

2k 2 x 2 / x 3 


R f 一 6K1) 
~R = x 3 


where a = v 2 — !. 

(c) Integrate these equations from xXob > x and take the limit as ^ 00 to get 

= 0c5o + to + ^ + R(x) = Roo-\- 

where 0oo = lim^oo(0(^) - kb) and Roq = lim^^oo R(b). 

(d) Substitute these and the appropriate expression for Q~ 1 ^ in Equation (18.16) 
and show that 


^ cos (k X - kXoo + 
where kx^o 三 itjl — 沴〜， 

⑹ Choose Rqq — ^2/it for all solutions of the Bessel DE, and let 


kXoo 


v + 


\)l 


and 



for the Bessel functions J v (x) and the Neumann functions Y v (: c) ， respectively, and 
find the asymptotic behavior of these two functions. 




18.4 PROBLEMS 523 


Additional Reading 

1. Birkhoff, G. and Rota, G.-C. Ordinary Differential Equations, 3rded., Wiley, 
1978. Has a good discussion of Sturm-Liouville differential equations and 
their asymptotic behavior. 

2. Boccara, N. Functional Analysis, Academic Press, 1990. Discusses the 
Sturm-Liouville operators in the same spirit as presented in this chapter. 

3. Hellwig, G. Differential Operators of Mathematical Physics, Addison- 
Wesley, 1967. An oldie, but goodie! It gives a readable account of the 
Sturm-Liouville systems. 




Sturm-Liouville Systems: Examples 


Chapter 12 showed how the solution of many PDEs can be written as the product 
of the solutions of the separated ODEs. These DEs are usually of Sturm-Liouville 
type. We saw this in the construction of spherical harmonics. In this chapter, con¬ 
sisting mainly of illustrative examples, we shall consider the use of other coordinate 
systems and construct solutions to DEs as infinite series expansions in terms of 
S-L eigenfunctions. 

19.1 Expansions in Terms of Eigenfunctions 

Central to the expansion of solutions in terms of S-L eigenfunctions is the question 
of their completeness. This completeness was established fora regular S-L system 
in Theorem 18.1.9. 

We shall shortly state an analogous theorem (without proof) that establishes 
the completeness of the eigenfunctions of more general S-L systems. This theorem 
requires the following generalization of the separated and the periodic BCs: 

Rim = a\iu(a) + ai 2 «’(a) + o ： i 3 M ( 办）屮 oc\^u f {b) — 0 , 

R 2 « = Qf 21 M ⑷ + » 22 «’⑷ + 以 23» ⑻ + a24U f (b) = 0, (19.1) 

where are numbers such that the rank of the following matrix is 2 : 

a = (oin an «13 «i4\ 

_ \ 0：21 QT 22 ^23 0 ( 24 / ' 

The separated BCs correspond to the case for which a\\ = a\,a \2 =： 卢 1 , 0:23 = 
and o ?24 = Pi, with all other zero. Similarly, the periodic BC is a special case 
for which an = —afi 3 = Qf 22 = —Qf 24 = 1, with all other o?" zero. It is easy to 




19.1 EXPANSIONS IN TERMS OF EIGENFUNCTIONS 525 


verify that the rank of the matrix a is 2 for these two special cases. Let 

U={ue G 2 [a, b] I Rju = 0, for j = 1,2} (19.2) 

be a subspace b), and—~to assure the vanishing of the RHSof the Lagrange 

identity — assume that the following equality holds: 

_ )de C Z)=p^ d <Z Z)- (19 . 3) 

We are now ready to consider the theorem (fora proof, see [Hell 67, Chapter 
7]), 

19.1.1. Theorem. The eigenfunctions [u n (x)]^ =l of an S-L system consisting of 
the S-L equation (/?mO' + (Xw — q)u ™ 0 and the BCs of (19 J) form a complete 
basis of the subspace U ofL^(a, b) described in (19.2). The eigenvalues are real 
and countably infinite and each one has a multiplicity of at most 2. They can be 
ordered according to size 入 i 5 入 2 S , and their only limit point is + 00 . 

First note that Equation (19.3) contains both separated and periodic BCs as 
special cases (Problem 19.1). In the case of periodic BCs, we assume that p(a)= 
p(b). Thus, all the eigenfunctions discussed so far are covered by Theorem 19.1.1. 
Second, the orthogonality of eigenfunctions corresponding to different eigenvalues 
and the fact that there are infinitely many distinct eigenvalues assure the existence of 
infinitely many eigenfunctions. Third, the eigenfunctions forma basis of II and not 
the whole b). Only those functions u e b) that satisfy the BC in (19.1) 
are expandable m terms of u n (x). Finally, the last statement of Theorem 19.1.1 is 
a repetition of part of Theorem 18.1.9 but is included because the conditions under 
which Theorem 19.1.1 holds are more general than those applying to Theorem 
18.1.9. 

Part II discussed orthogonal functions in detail and showed how other functions 
can be expanded in terms of them. However, the procedure used in Part II was ad 
hoc from a logical standpoint. After all，the orthogonal polynomials were invented 
by nineteenth-century mathematical physicists who, in their struggle to solve the 
PDEs of physics using the separation of variables, came across various ODEs 
of the second order, all of which were recognized later as S-L systems. From a 
logical standpoint, therefore, this chapter should precede Part H. But the order of 
the chapters was based on clarity and ease of presentation and the fact that the 
machinery of differential equations was a prerequisite for such a discussion. 

Theorem 19.1,1 is the important link between the algebraic and the analytic ma¬ 
chinery of differential equation theory. This theorem puts at our disposal concrete 
mathematical functions that are calculable to any desired accuracy (on a computer, 
say) and can serve as basis functions for all the expansions described in Part II. 
The remainder of this chapter is devoted to solving some PDEs of mathematical 
physics using the separation of variables and Theorem 19.1.1. 




526 19. STURM-LIOUVILLE SYSTEMS: EXAMPLES 



Figure 19.1 A rectangular conducting box of which one face is held at the potential 
f(x, >?) and the other faces are grounded. 


19.2 Separation in Cartesian Coordinates 


Problems most suitable for Cartesian coordinates have boundaries with rectangular 
symmetry such as boxes or planes. 

rectangular 19.2.1. Example, rectangular conducting box 
conducting box Consider a rectangular conducting box with sides a ， b ，and c (see Figure 19.1). All faces are 
held at zero potential except the top face, whose potential is given by a function f(x, y). 
Let us find the potential at all points inside the box. 

The relevant PDE for this situation is Laplace’s equation, V 2 0 = 0. Writing y, z) 
as a product of three functions, 0 (jc, y ， z) = X{x)Y{y)Z{z)^ yields three ODEs (see 
Problem 19.2): 


j 2 y y z/2 7 

-^-+kX = 0, = 0, + vZ = 0, (19.4) 

dx 2 dy 2 dz 2 

where 入 + 从 + v = 0. The vanishing of O at a: =0 and x = a means that 

O(0 ， ;y ， z) = Xa))r(;y)Z(z)：=0 Vy 9 z X(0) = 0 ， 

^(a,y iZ ) = X(a)Y(y)Z(z) = 0 V y^z X(a) = 0. 

We thus obtain an S-L system, X ff + XX = 0, 又 ⑼ = 0 = X(a) t whose BC is neither 
separated nor periodic, but satisfies (19.1) with an = 吧 =1 and all other aij zero. This 
S-L system has the eigenvalues and eigenfunctions 


MW 


and X n (x) = sin ^ 


for n = 1,2,.. 


Similarly, the second equation in (19.4) leads to 

/miz \ 

and Y m (y) = sin 


/m7T\2 

⑽ =( 了） 


for m = 1,2, 


On the other hand, the third equation in (19.4) does not lead to an S-L system because 
the BC for the top of the box does not fit (19.1). This is as expected because the “eigenvalue” 
v is already determined by k and fi. Nevertheless, we can find a solution for that equation. 
The substitution 


19.2 SEPARATION IN CARTESIAN COORDINATES 527 


changes the Z equation to Z" — y^ in Z = 0, whose solution, consistent with Z(0) = 0, is 
Z(z) = sinh(y wn z). 

We note that X(x) and Y(y) are functions satisfying RiZ = 0 = R 2 X. Thus, 
by Theorem 19.1.1, they can be written as a linear combination of X n (x) and F m (y): 
X(x) = X!^=i sm(n7ix/a) and 7(y) = ^J^ =1 B m sin(mjr/?/y). Consequently, the 
most general solution can be expressed as 

00 00 

^(x,y y z) = X(x)Y(y)Z(z )= 乞艺 A , 朗 sin ( 子 x) sin i^~y) sinh(y wm z), 

n=l m=l L 

where Amn = Cmn. 

To specify O completely, we must determine the arbitrary constants A mn . This is done 
by imposing the remaining BC, <E>U, y, c) = f{x, y), yielding the identity 

00 00 

/(A ： y )= 艺 ^ A 職 sin sin {p^~y) s\Bh(y mn c) 

n=\ m=l 

= £f ： Uin(^c) S in (^；)， 

«=1 m=l 

where B mn = A mn sinh()/ mn c). This is a two-dimensional Fourier series (see Chapter 8) 
whose coefficients are given by 

„ 4 f a ^ f b , 、 . fnn \ . (mit \ 

Bmn = — ax dyf(x,y) sin xj sin ( 丁 : vj • ■ 


Pierre Simon de Laplace (1749-1827) was a French 
mathematician and theoretical astronomer who was so fa¬ 
mous in his own time that he was known as the Newton of 
France. His main interests throughout his life were celestial 
mechanics, the theory of probability, and personal advance¬ 
ment. 

At the age of 24 he was already deeply engaged in the 
detailed application of Newton’s law of gravitation to the 
solar system as a whole, in which the planets and their satel¬ 
lites are not governed by the sun alone, but interact with one 
another in a bewildering variety of ways. Even Newton had 
been of the opinion that divine intervention would occasionally be needed to prevent this 
complex mechanism from degenerating into chaos. Laplace decided to seek reassurance 
elsewhere, and succeeded in proving that the ideal solar system of mathematics is a sta¬ 
ble dynamical system that will endure unchanged for all time. This achievement was only 
one of the long series of triumphs recorded in his monumental treatise Mecanique Celeste 
(published in five volumes from 1799 to 1825), which summed up the work on gravitation 
of several generations of illustrious mathematicians. Unfortunately for his later reputation, 
he omitted all reference to the discoveries of his predecessors and contemporaries, and left 
it to be inferred that the ideas were entirely his own. Many anecdotes are associated with 
this work. One of the best known describes the occasion on which Napoleon tried to get a 






528 19. STURM-LI0UV1LLE SYSTEMS: EXAMPLES 


rise out of Laplace by protesting that he had written a huge book on the system of the world 
without once mentioning God as the author of the universe. Laplace is supposed to have 
replied, “Sire，I had no need of that hypothesis.” The principal legacy of the Mecanique 
Celeste to later generations lay in Laplace’s wholesale development of potential theory, 
with its far-reaching implications for a dozen different branches of physical science ranging 
from gravitation and fluid mechanics to electromagnetism and atomic physics. Even though 
he lifted the idea of the potential from Lagrange without acknowledgment, he exploited it 
so extensively that ever since his time the fundamental equation of potential theory has 
been known as Laplace’s equation. After the French Revolution> Laplace’s political talents 
and greed for position came to full flower. His compatriots speak ironically of his “supple- 
ness” and “versatility” as a politician. What this really means is that each time there was a 
change of regime (and there were many), Laplace smoothly adapted himself by changing 
his principles~back and forth between fervent republicanism and fawning royalism — and 
each time he emerged with a better job and grander titles. He has been aptly compared with 
the apocryphal 切 car of Bray in English literature, who was twice a Catholic and twice a 
Protestant. The Wear is said to have replied as follows to the charge of being a turncoat: 
“Not so, neither, for if I changed my religion, I am sure I kept true to my principle, which 
is to live and die the Vicar of Bray.” 

To balance his faults, Laplace was always generous in giving assistance and encourage¬ 
ment to younger scientists. From time to time he helped forward in their careers such men 
as the chemist Gay-Lussac, the traveler and naturalist Humboldt, the physicist Poisson, and — 
appropriately — the young Cauchy, who was destined to become one of the chief architects 
of nineteenth century mathematics. 


steady-state 

heat-conducting 

plate 

heat-conducting 
plate: steady state 


Laplace’s equation describes not only electrostatics, but also heat transfer. 
When the transfer (diffusion) of heat takes place with the temperature being in¬ 
dependent of time, the process is known as steady-state heat transfer. The dif¬ 
fusion equation, dT/Bt = a 2 V 2 T, becomes Laplace’s equation, V 2 T = 0, and 
the technique of the preceding example can be used. It is easy to see that the 
diffusion equation allows us to perform any linear transformation on T, such as 
r ffT + and still satisfy that equation. This implies that T can be measured 
in any scale such as Kelvin, Celsius, and Fahrenheit. 

19.2.2. Example, steady-state heat-conducting plate 

Let us consider a rectangular heat-conducting plate with sides of lengths a and b. Three 
of the sides are held at T = 0, and the fourth side has a temperature variation T = f(x) 
(see Figure 19.2). The flat faces are insulated, so they cannot lose heat to the surroundings. 
Assuming a steady-state heat transfer, let us calculate the variation of T over the plate. The 
problem is two-dimensional. The separation of variables leads to 


d 2 X 

lx 1 


+ 入 X = 0, 


d 2 Y 

~d^ 




0 , 


where 人 + 认 = 0- 


(19.5) 


TheX equation and the BCs T(0, y) = T(a, y) = 0 formanS-L system whose eigenvalues 
and eigenfunctions are X n = (nn/a) 2 mdX n (x) = sm(n7rx/a)fotn = 1，2 Thus, ac¬ 
cording to Theorem 19.1.1, a general X (x) can be written as X (x) = A n ^n(nnx/a). 



19.2 SEPARATION IN CARTESIAN COORDINATES 529 



Figure 19.2 A heat-conducting rectangular plate. 


The Y equation, on the other hand, does not form an S-L system due to the fact that its 
“eigenvalue” is predetermined by the third equation in (19.5). Nevertheless, we can solve the 
equation Y n — (n7t/a) 2 F = Oto obtain the general solution Y (y) == Ae n7ry ^ a + Be~ nny / a . 
Since r(;c, 0) = 0 V x，we must have Y (0) = 0. This implies that A+5 = 0, which, in turn, 
reduces the solution to Y = A sinh(n7T^/a). Thus, the most general solution, consistent 
with the three BCs T (0, y) = T(a, y) = T(x, 0) = 0, is 


y) = X(x)Y(y) = ^B n sm sinh - 


The fourth BC gives a Fourier series, 


fM = ^2 sinh (^ 办 ） sin ^ ^ C n sin 


whose coefficients can be determined from 

C n = B n sinh ^ = - J sin / W 心 ^ 

In particular, if the fourth side is held at the constant temperature Tq, then 

f 4Tf) 

2T 0 / a \ — if n is odd, 

C n = — I — ) [1 - (-1) 1 = nit 

a \而 ’ lo if n is even, 


and we obtain 


T(x,y) 


1 sin[(2fc + l)7tx/a] sinh[(2* + l)7ty/a] 
2k+ 1 sinh[(2& + l)jtb/a] ^ 


(19.6) 


If the temperature variation of the fourth side is of the form f(x) = Jo sin(^x/a), then 

Cn = ^ lo shl P?) sk ( t ) dx = ^ r ( i ) Sn ' 1 = T ° Sn - 1 




530 19. STURM-LIOUVILLE SYSTEMS: EXAMPLES 


conduction of heat in 
a rectangular plate 


and B n 
T(x,y)= 


C n j sinh(«jr/?/fl) = [To/ smh(n7tb/a)]8 ni \, and we have 
sm(irx/a) sinh^y/a) 


To 


s\nh(7tb/a) 


(19.7) 


Only one term of the series survives in this case because the variation on the fourth side 
happens to be one of the harmonics of the expansion. 

Note that the temperature variations given by (19.6) and (19.7) are independent of 
the material of the plate because we are dealing with a steady state. The conductivity of 
the material is a factor only in the process of heat transfer, leading to the steady state. 
Once equilibrium has been reached, the distribution of temperature will be the same for all 
materials. ■ 


The preceding two examples concerned themselves with static situations. The 
remaining examples of this section are drawn from the (time-dependent) diffusion 
equation, the Schrodinger equation, and the wave equation. 


19.2.3. Example, conduction of heat in a rectangular plate 
Consider a rectangular heat-conducting plate with sides of length a and 办 all held at T =0. 
Assume that at time ^ = 0 the temperature has a distribution function /(x, y). Let us find 
the variation of temperature for all points (x, y) at all times f > 0. 

The diffusion equation for this problem is 


dT 

~3t 




)• 


Vax 2 dy 

A separation of variables, T(x, y t t) = X(x)Y(y)g(t) 9 leads to three DEs: 


rs 

■ — y + = o, ^ 9 + — o> -j ： + + ix)g — o. 

dx 2 dy 2 冶 

The BCs T(0, y, t) = T{a,y, t) = 7*(^, 0,0 = T(x,b, t) = 0, together with the three 
ODEs, give rise to two S-L systems. The solutions to both of these are easily found: 


o 2 

and 

XM = 

• (nit \ 

= sm (T^ 

for n = 

:1 ， 2 ,…， 

/m7r\2 

⑽ =(T) 

and 

Y m (y )= 

=f) 

for m : 

= 1 ， 2,… . 


These give rise to the general solutions 

X ⑴ =E sin ， Y (y) = {^~y ) - 

n=l wi=l 

With y mn = k 2 jr 2 (n 2 /a 2 + m 2 /b 2 ), the solution to the g equation can be expressed as 
g(0 = C mn e~ Ymnt . Putting everything together, we obtain 


OO 00 


T(x,y f t) = ^ X] A mn e~ ymnt sin sin (p^y) ^ 


n=l m=l 


19.2 SEPARATION IN CARTESIAN COORDINATES 531 


quantum particle in a 
box 


where A mn = A n B m C mn is an arbitrary constant. To determine it, we impose the initial 
condition T(x, 0) = f(x, y). This yields 

oo oo 


f(x，= Amn sin {^ x ) sin (^ y ) 9 


n=l m=l 

which determines the coefficients A 

•a rb 


A, 


^fo dx f Q 办 


H 


19.2A Example. QUANTUM PARTICLE IN A BOX 

The behavior of an atomic particle of mass /x confined in a rectangular box with sides a, 办， 
andc (an infinite three-dimensional potential well) is governed by the Schrodinger equation 
for a free particle, 


ih 


bt 


h 2 ( 3 2 少 a 2 少 B 2 \jr 

a7" + a? 


hi 


)， 


and the BC that f(x ， y ， z ，0 vanishes at all sides of the box for all time. 

A separation of variables y, z, t) = X{x)Y{y)Z{z)T(t) yields the ODEs 


d 2 X 


d 2 Y 


0 ， 

dx 2 
dT . 

dy 2 

- 1- icoT = 

dt 

0 ， 

where co = 


h 


2^i 


0， 

( 入 + tr + v). 


d 2 Z 


+ vX = 0, 


The spatial equations, together with the BCs 


柳 , y ， z,t) 

= f(a,y ， z ， t) =0 



= 0 = X(a), 

帅， 0,2,0 

= f(x ， b, z.t)=0 


Y(0) = 

= 0 = Y{b\ 

少 (U ， 0 ， f) 

=ir(x,y t c,t) =0 


2(0) = 

= 0 = Z(c), 


lead to three S-L systems, whose solutions are easily found: 


m j 

fU7t \ 

/tt7T\2 


X n (x) = sm 

l — X ) » 

入 n = ( —) ， 

for tt = 1， 2. 



\ a / 


i 

fmn \ 

/mix \ 2 


Ym(y) = sin ( 

[j 

Gm = VV) J 

for m = 1 ， 2 ,…， 


Z/(z) = sin , v/ = (~) ? for / = 1,2, 

The time equation, on the other hand, has a solution of the form 


r( o=c- - where ^=^[(f) 2 + (^) 2 + a 2 ] 

The solution of the Schrodinger equation that is consistent with the BCs is therefore 
Z,t)= ^2 A lmne~ ia)lmnt Sin sin (^ ： y) sin ($) ■ 


532 19. STURM-LIOUVILLE SYSTEMS: EXAMPLES 


density of states 


Fermi energy 


The constants A[ mn are determined by the initial shape ， ir(x,y, z, 0) of the wave function. 
The energy of the particle is 


E = ^Imn = 


hV 

2fi 


n 2 m 2 l 2 \ 


Each set of three positive integers (n, m, 0 represents a state of the particle. For a cube, 
a = b = c 三 L ，and the energy of the particle is 


E= ^j(n 2 + 肌 2 + / 2 ) = ——(n 2 + m 2 +l 2 ) (19-8) 

2fiL 2 

where V =： L 3 is the volume of the box. The ground state is (1 ， 1 ， 1)，has energy E = 
3h 2 7t 2 /2fiV 2 ^, and is nondegenerate (only one state corresponds to this energy). However, 
the higher-level states are degenerate. For instance, the three distinct states (1 ， 1 ， 2) ，（ 1 ， 2 ， 1 )， 
and (2,1,1) all correspond to the same energy, E = 6/i 2 7r 2 /2/xV 2 / 3 . The degeneracy 
increases rapidly with larger values of n, m, and/. 

Equation (19.8) can be written as 


rt 2 + m 2 + / 2 = J? 2 , 


where 



2fxEV 2 / 3 
仏 2 • 


This looks like the equation of a sphere in the «m/-space. If R is large, the number of states 
contained within the sphere of radius (the number of states with energy less than or equal 
to E) is simply the volume of the first octant 1 of the sphere. If N is the number of such 
states, we have 



it f2fj,EV 2 / 3 \ V2 _ it / 2fiE \ 3 / 2 

6 \ ) = 6 \¥^) 


Thus the density of states (the number of states per unit volume) is 


n 


N_ 

'v 


^ \ 3/2 F 3/2 

6 • 


(19.9) 


This is an important formula in solid-state physics, because the energy E is (with minor 
modifications required by spin) the Fermi energy. If the Fermi energy is denoted by Ef, 
Equation (19.9) gives 五 / = ar« 2 / 3 where a is some constant. ■ 


In the preceding examples the time variation is given by a first derivative. Thus, 
as far as time is concerned, we have a FODE. It follows that the initial specification 
of the physical quantity of interest (temperature T or Schrodinger wave function 
i/r) is sufficient to determine the solution uniquely. 

A second kind of time-dependent PDE occurring in physics is the wave equa¬ 
tion, which contains time derivatives of the second order. Thus, there are two 
arbitrary parameters in the general solution. To determine these, we expect two 
initial conditions. For example, if the wave is standing，as in a rope clamped at both 


1 This is because n, m, and / are all positive. 


guided waves 


19,2 SEPARATION IN CARTESIAN COORDINATES 533 


ends, the boundary conditions are not sufficient to determine the wave function 
uniquely. One also needs to specify the initial (transverse) velocity of each point of 
the rope. For traveling waves, specification of the wave shape and velocity shape 
is not as important as the mode of propagation. For instance, in the theory of wave 
guides, after the time variation is separated, a particular time variation, such as 
e +l0)t , and a particular direction for the propagation of the wave, say the ^-axis, 
are chosen. Thus, if u denotes a component of the electric or the magnetic field, 
we can write u(x, y, z, t) = ^r(x, 3 ； ) 乂 ( 紐土 &)， where k is the wave number. The 
wave equation then reduces to 



Introducing y 2 = a) 2 /c 2 — k 2 and the transverse gradient = (3/3x 9 d/dy) and 
writing the above equation in terms of the full vectors, we obtain 


(V, 2 + y 2 ) {1} = 0, where 《} = {: jj} ” 


(19.10) 


These are the basic equations used in the study of electromagnetic wave guides 
and resonant cavities. 

Maxwell’s equations in conjunction with Equation (19.10) gives the transverse 
components (components perpendicular to the propagation direction) E r and B t 
in terms of the longitudinal components E z and B z (see [Loir 88, Chapter 33]): 

y 2 Er = V ? 

y 2 B f = ▽, (^)+^e z x (V t E z ). (19.11) 

Three types of guided waves are usually studied. 

1. Transverse magnetic (TM) waves have B z = 0 everywhere. The BC on E 
demands that E z vanish at the conducting walls of the guide. 

2. Transverse electric (TE) waves have E z = 0 everywhere. The BC on B 
requires that the normal directional derivative 

on 

vanish at the walls. 

3. Transverse electromagnetic (TEM) waves have B z = 0 = E z . For a nontriv¬ 
ial solution, Equation (19.11) demands that y 2 = 0. This form resembles a 
free wave with no boundaries. 




534 19. STURM-L10UV1LLE SYSTEMS: EXAMPLES 


rectangular wave 
guides 


We will discuss the TM mode briefly (see any book on electromagnetic theory 
for further details). The basic equations in this mode are 


(V r 2 + y 2 )E z = 0, B z =0, 

y 2 E, = V, ( 手 ) ， y 2 B t =i^e z x (V t E z ). 


(19.12) 


19.2.5. Example, rectangular wave guides 

For a wave guide with a rectangular cross section of sides a and b in the x and the y 
directions, respectively, we have 

乌 + 与 +A=。. • 

dx 2 dy 2 

A separation of variables, E z (x t y) — X (jt)F( 3 ?), leads to two S-L systems, 
d 2 X 


dx 2 

d 2 Y 

~d/ 


+ XX = 0, 

+ fjiY = 0, 


X(0) = 0 = X{a), 
F(0) = 0 = Y(b), 


where y 2 = 入 + /x. These equations have the solutions 


/n7r\2 


X n (x) = sm(—x) t k n =(— 
\ a / V a ^ 

. (mit \ 


Y m (y) = ^m\—y) 9 ii m = 

The wave number is given by 


for w =： 1,2, 
for m = 1,2, 


kmn 


Q)^ /«7T\2 /mjr\2 


■? 


-(?) -f 


b 


which has to be real if the wave is to propagate (an imaginary k leads to exponential decay 
or growth along the z-axis). Thus, there is a cutoff frequency. 


//njr\2 /m7T\2 

^mn = cj ( — ) + y—J 


for > 1 ， 


a 


below which the wave cannot propagate through the wave guide. It follows that for a TM 
wave the lowest frequency that can propagate along a rectangular wave guide is (D\\ = 
ncy/a 1 + b 2 /(ab). 

The most general solution for E z is therefore 
oo 


£,= £ A 細 sin (苧 x)sin(73 ^ (⑽ ±W) . 


m,n=l 

The constants A mn are arbitrary and can be determined from the initial shape of the wave, 
but that is not commonly done. Once E z is found, the other components can be calculated 
using Equation (19.12). 豳 


19.3 SEPARATION IN CYLINDRICAL COORDINATES 535 





Figure 19-3 A conducting cylindrical can whose top has a potential given by V(p, <p), 
with the rest of the surface grounded. 


193 Separation in Cylindrical Coordinates 

When the geometry of the boundaries is cylindrical, the appropriate coordinate 
system is the cylindrical one. This usually leads to Bessel functions “of some 
kind.” 

Before working specific examples of cylindrical geometry, let us consider a 
question that has more general implications. We saw in the previous section that 
separation of variables leads to ODEs in which certain constants (eigenvalues) 
appear. Different choices of signs for these constants can lead to different fixnctional 
forms of the general solution. For example, an equation such as d 2 x/dt 2 —kx=Q 
can have exponential solutions if fe > 0 or trigonometric solutions if A: < 0. One 
cannot a priori assign a specific sign to k. Thus, the general form of the solution 
is indeterminate. However, once the boundary conditions are imposed, the unique 
solutions will emerge regardless of the initial functional form of the solutions (see 
[Hass 99] for a thorough discussion of this point). 

conducting 19.3.1. Example* conducting cylindrical can 
cylindrical can Consider a cylindrical conducting can of radius a and height h (see Figure 19.3). The 
potential varies at the top face as y(p, (p) t while the lateral surface and the bottom face are 
grounded. Let us find the electrostatic potential at all points inside the can. 

A separation of variables transforms Laplace's equation into three ODEs: 





536 19. STURM-UOUVILLE SYSTEMS: EXAMPLES 


Fourier-Bessel series 


与 +m 2 S =。， 与 —& 2 Z = 0 ， 

d(p 2 dzr 

where in anticipation of the correct BCs, we have written the constants as k 2 and -m 2 with 
m an integer. The first of these is the Bessel equation, whose general solution can be written 
as R(p) = AJ m (kp) -h BY m (fcp). The second DE，when the extra condition of periodicity 
is imposed on the potential, has the general solution 

S((p) = C cos rrup + Dsinm^j. 

Finally the third DE has a general solution of the form 

Z(z) = Ee kz + Fe~ kz . 

We note that none of the three ODEs lead to an S-L system of Theorem 19.1.1 because the 
BCs associated with them do not satisfy (19.1). However, we can still solve the problem by 
imposing the given BCs. 

The fact that the potential must be finite everywhere inside the can (including at 户 = 0) 
forces B to vanish because the Neumann function Y m (kp) is not defined at p = 0. On the 
other hand, we want O to vanish at p = a. This gives J m (ka) = 0, which demands that 
ka be a root of the Bessel function of order m. Denoting by x mn the nth zero of the Bessel 

function of order m, we have ka = x mn , or k = x mn /a for w = 1, 2. 

Similarly, the vanishing of 中 at 2 = 0 implies that 

E = —F and Z(z) = E sinh . 

We can now multiply R, S f and Z and sum over all possible values of m and n，keeping 
in mind that negative values of m give terms that are linearly dependent on the corresponding 
positive values. The result is the so-called Fourier-Bessel series: 

^-p) sinh (^-z) {A mn cosm<p- {- B mn smnup), 
a ^ a J (19.13) 


OO OO 


少 ( P ， 屮， z ) 


!：!>( 
m=0 n=l 


where A mn and B mn are constants to be determined by the remaining BC. To find these 
constants we use the orthogonality of the trigonometric and Bessel functions. For z = h 
Equation (19.13) reduces to 


OOOO 


V(p ， tp) = X] Jm sinh (~^ h ) ^ Amn cosm 史 + S 細 sin m 炉 ), 


m=0n=l 
from which we obtain 

^mn 


2 


»27T 


na 2 J^ x {x mn )^{x mn h/d) JO 


Bmn 


2 


•2jt 


sinh(x mn h/a) Jo 
where we have used the following result derived in Problem 14.39: 


d<p I dppV(p,cp)J m cosm(p, 

^ j 0 <P)^m (~'p) sinm<p, 


P J m = Y J ^+^ Xmn ^' 


(19.14) 


19.3 SEPARATION IN CYLINDRICAL COORDINATES 537 


circular 

heat-conducting 

plate 


cylindrical wave 
guide 


For the special but important case of azimuthal symmetry，for which V is independent 
of 炉 ， we obtain 


Amn = 叫二 0/1 _ jo 、— 0 (亨 P ) 


Bfnn = 0 . 


The reason we obtained discrete values for A: was the demand that O vanish at 
p = a.Ifweleta oo, then/: will be a continuous variable, and instead of a sum 
over k, we will obtain an integral. This is completely analogous to the transition 
from a Fourier series to a Fourier transform, but we will not pursue it further. 


19.3.2. Example, circular heat-conducting plate 

Consider a circular heat-conducting plate of radius a whose temperature at time r = 0 has 
a distribution function /(p, (p). Let us find the variation of T for all points (p, (p) on the 
plate for time r > 0 when the edge is kept at T = 0. 

This is a two-dimensional problem involving the heat equation, 


P 〜 2 [ 賊 ) +*0 


A separation of variables, T(p, <p, t) = R(p)S{(p)g(t), leads to the following ODEs: 

d 2 S 


，却 2 

d 2 R 1 dR 


+ nS = 0 , 


dp 


2 


p dp \p 2 


(5 + 十 




0. 


To obtain exponential decay rather than growth for the temperature, we demand that 入 e 
b 2, < 0. To ensure periodicity (see the discussion at the beginning of this section), we must 
have \i = m 2 , where m is an integer. To have finite T at p = 0, no Neumann function is to 
be present. This leads to the following solutions: 


g(t) = Ae- k2blt , S{<p) = Bcosnup C sinm 炉， R(p) = DJ m (bp). 


If the temperature is to be zero at p = a 9 we must have J m {ba) = 0, or 6 = x mn /a. It 
follows that the general solution can be written as 


T(p, e~ k2 ^ Xmn/a)2t J m (—p)(A mn cosm<p^B mn sin rrup). 

m=0n=l ^ 

A mn and B mn can be determined as in the preceding example. ■ 

19.3.3. Example, cylindrical wave guide 

For a TM wave propagating along the z-axis in a hollow circular conductor, we have [see 
Equation (19.12)] 


i a 



1 9 2 £ z 
p 2 d<p 2 


+ = 0. 





538 19. STURM - LIOUVILLE SYSTEMS: EXAMPLES 


current distribution 
in a circular wire 


skin depth 
Kelvin equation 


The separation E z = R(p)S((p) yields 5(^) = A cosmfp + B sin m 史 and 

必 R 1 ,1 D / ，舶2 


dfy 


1 dR ( 2 n 

+ i ^ + ( r - 冗)及 = . 


The solution to this equation, which is regular at p = 0 and vanishes at p = a, is 

x mn 


R(P) = CJ m ^ P) 


y 


a 


Recalling the definition of y, we obtain 




k 2 


y 




V aP" 


出 2 ^mn 

~2 


This gives the cut-off frequency a) mn = cx mn /a. 

The solution for the azimuthally symmetric case (m = 0) is 


E z (p,<p t O = £a„/ 0 (» 


i (tt>f 士 hz) 


and 


忍 z = 0, 


where k n = yjaP'/c 2, — 々 n /a\ 


m 


There are many variations on the theme of Bessel functions. We have encoun¬ 
tered three kinds of Bessel functions, as well as modified Bessel functions. Another 
variation encountered in applications leads to what are known as Kelvin functions, 
introduced in the following example. 

19.3.4. Example, current distribution in a circular wire 

Consider the flow of charges in an infinitely long wire with a circular cross section of 
radius a. We are interested in calculating the variation of the current density in the wire 
as a function of time and location. The relevant equation can be obtained by starting with 
Maxwell’s equations for negligible charge density (V ■ E = 0), Ohm’s law (j = crE)，the 
assumption of high electrical conductivity (|aE| |5E/3^|), and the usual procedure of 
obtaining the wave equation from Maxwell’s equations. The result is 


▽ 2 j- 竽! 



Moreover, we make the simplifying assumptions that the wire is along the z-axis and 
that there is no turbulence, so j is also along the z direction. We further assume that j is 
independent of <p and z，and that its time-dependence is given by e~ lQ>t . Then we get 


^7 1^/ 

dp 2 P dp 


+ r 2 j = 0, 


(19.15) 


where r 2 = iAiraco/c 2, = i2/8^ and 5 = c/\/2naa> is called the skin depth. 
The Kelvin equation is usually given as 


d 2 w 1 dw 
dx 2 + x dx 


ik 2 w = 0 . 


(19.16) 


19.3 SEPARATION IN CYLINDRICAL COORDINATES 539 


Kelvin function 


quantum particle in a 
cylindrical can 


If we substitute x = y/itjk, i\ becomes ib + w/t + w = 0 which is a Bessel equation of 
order zero. If the solution is to be regular at x =： 0, then the only choice is w(0 = Jo(0 = 
Jo(e~ in ^kx). This is the Kelvin function for Equation (19.16). It is usually written as 

Jo(e~ lJT ^ 4 kx) = ber(^) 4 - i bei(fejc) 


where ber and bei stand for “Bessel real” and “Bessel imaginary，” respectively. If we sub¬ 
stitute z = e~ tn ^kx in the expansion for /ofe) and separate the real and the imaginary 
parts of the expansion, we obtain 


ber(jt) = 1 — 


(x/2) 4 q/ 2) 8 

W" w 


bei(;c)= 


(x/2) 2 (x/2) 6 (x/2) 10 

(l !) 2 (3 !) 2 + (5 !) 2 


Equation (19.15) is the complex conjugate of (19.16) with /: 2 = 2/S 2 , Thus, its solution 
is 


j(p) = AJ 0 (e l7t/4 kp) = A ber ( 手 / bei (孕 J • 


We can compare the value of the current density at p with its value at the surface p = a , 

1/2 

I w2/ I 

j(p) 


•/⑷ 


ber 2 1 

〔孕 p)+bei 2 | 

f^2 \ 

^T p ) 

ber 2 1 


| + bei 2 1 

( 4a) 


For low frequencies, 5 is large, which implies that p/8 is small; thus, ber(V5^o/<5) ^ 1 
and bei(v^p/^) ^ 0, and I 7 (p)/j(a)\ ^ 1; i.e.» the current density is almost uniform. For 
higher frequencies the ratio of the current densities starts at a value less than 1 at p = 0 
and increases to 1 at p = a. The starting value depends on the frequency. For very large 
frequencies the starting value is almost zero (see [Mari 80, pp 150-156]). 趣 


19.3.5. Example, quantum particle in a cylindrical can 

Let us consider a quantum particle in a cylindrical can. For an atomic particle of mass /i 
confined in a cylindrical can of length L and radius a, the relevant Schrodinger equation is 


M h 

l - = - 

dt 2fJi 


說 ) 


i a 2 ^ d 2 jr' 
p 2 d(p 2 + 1?. 


Let us solve this equation subject to the BC that <p, z, t) vanishes at the sides of the 
can. 

A separation of variables, z, t) = R(p)S((p)Z(z)T (f), yields 


dt 


=—icoT 


d 2 Z 

dz 2 


h~ yjZi ― o, 


d 2 S 


+ m 2 S = 0 , 


d 2 R 1 dR 
dp 2 + P 办 + 



R = 0. 


(19.17) 


540 19. STURM-L1QUV1LLE SYSTEMS: EXAMPLES 


The Z equation, along with its periodic BCs, constitutes an S-L system with solutions 

Z(z) = sin (^^z) A. = for k = l t 2 . 

If we let Ifico/h- (kn/L) 2 = b 2 , then the last equation in (19.17) becomes 


d 2 R 1 dR 


b 2 




) 


R = 


dp 2 P dp V (>‘ 

and the solution that is well-behaved at p = 0 is J m (bp), Since R(a) = 0, we obtain the 
quantization condition b = x mn ja for « = 1 ， 2, ■ ■ • ■ Thus, the energy eigenvalues are 


Ekmn = 


mn 


I 

2fj, 


to、 2 


x mn 


and the general solution can be written as 


00 


^ e -i ⑴ kmnt J m ( ， L p ) S in (字 2 ) (A kmn cos m<p-\- B kmn sinm 炉 ) 

k y n=l * 

m=0 


m 


19.4 Separation in Spherical Coordinates 


Recall that most PDEs encountered in physical applications can be separated, in 
spherical coordinates, into 


L 2 Y(9 t <p) = 1(1 + l)Y(0 9 <p) 9 


d 2 R 

~dr^ 


2dR 


r r dr 


+ 



/(/ + 1) 


R 


= 0 . 


(19.18) 


We discussed the first of these two equations in great detail in Chapter 12. In 
particular, we constructed Y\ m (0,(p)m such a way that they formed an orthonormal 
sequence. However, that construction was purely algebraic and did not say anything 
about the completeness of Yi m (0, (p). With Theorem 19.1.1 at our disposal，we 
can separate the first equation of (19.18) into two ODEs by writing Yi m (9, <p )= 
Pl m (0)S m (^).V^c obtain 


d_ 

dx 


(1 - x 2 ) 


dPhrT 
dx J 


d 2 S m 

d(p 2 


+ m 2 S m — 0, 


/(/ + !) — 



Plm = 0 , 


where jc = cos 0. These are both S-L systems satisfying the conditions of Theorem 
19.1.1. Thus, the S m are orthogonal among themselves and form a complete set 
for 厶 2 (0,2 丌 ） .Similarly, for any fixed m, the forma complete orthogonal 

set for /C 2 (—1, +1) (actually for the subset of £i 2 (—1, +1) that satisfies the same 



19.4 SEPARATION IN SPHERICAL COORDINATES 541 


BC as the Pi m do at x = 土 1). Thus, the products Yi m (x,<p) - Pi m (x)S m ((p) form 
a complete orthogonal sequence in the (Cartesian product) set [—1 ， +1] x [0, 
which, in terms of spherical angles, is the unit sphere, 0<0 <n,0 <(p < lit. 

Let us consider some specific examples of expansion in the spherical coordinate 
system starting with the simplest case, Laplace’s equation for which /(r) = 0. 
The radial equation is therefore 


d 2 R 2dR 1(1 +1) ^ ^ 

— 7T H --- = — R =0. 

dr r dr r L 

Multiplying by r 2 , substituting r = e f , and using the chain rule and the fact that 
dt/dr = 1/r leads to the following SOLDE with constant coefficients: 


d 2 R 

dt 2 


+ 芽 - 叩 + 1 ) 及 


= 0 . 


This has a characteristic polynomial p{X) = k 2 X —1(1 1) with roots X\ =1 

and 入 2 = —(, + 1). Thus, a general solution is of the form 


R(t) = + Be k2t = A(e { ) 1 

or, in terms of r, R(r) = Ar l + Br~ l_1 . Thus, the most general solution of 
Laplace’s equation is 


00 l 

<D(r ， 0,(p) = J2Yl + Bi m r~ l - l )Y lm (0, <p). 

1=0 m—~l 

For regions containing the origin, the finiteness of <I> implies that B\ m = 0. 
Denoting the potential in such regions by we obtain 

00 / 

Oink 0,(p) = Y^ X! Mmr l Yi m {0,(p). 

1=0 m=—l 


Similarly, for regions including r = oo, we have 

00 l 

^out(r, 0 i( p) = J2 E B^r-^Yunie^y 

/—0 m——l 

To determine Ai m and B\ m , we need to invoke appropriate BCs. In particular, 
for inside a sphere of radius a on which the potential is given by V(6, <p),v/Q have 

00 l 

V ( e » <p) = H Ai m a l Yi m {6,(p). 

1=0 m=—l 


542 19. STURM-LIOUVILLE SYSTEMS: EXAMPLES 


Helmholtz equation 


spherical Bessel 
functions 


Multiplying by (p) and integrating over dQ = sin9 dO dcp,wt obtain 

A kj =^~ k ff d^ViO, cp)Y^j(0, (p) ^ M m = a ~ l ff dQV(9, (p)Yf m {0, (p). 
Similarly, for potential outside the sphere, 

B im - a l+l Jl dQV(0, 

In particular, if V is independent of (p 9 only the components for which m = 0 
are nonzero, and we have 


2tt f 71 * 2tt /2/ + 

Aw = ~j 九 smOV(0)Yr o (0)de = Vy l^ 

which yields 

go /r\^ 

^in(^ = X! (f ft( cos 扑 

1=0 ^ a 

where 


•7t 


sin 0V(0)Pi (cos 6) d9. 


2 


^sin^V^P/^os^) dO. 


1 2/ + l；o 

Similarly, 

^out(^ 0)= Pl(cosO). 


The next simplest case after Laplace’s equation is that for which / (r) is a constant. 
The diffusion equation, the wave equation, and the Schrodinger equation for a free 
particle give rise to such a case once time is separated from the rest of the variables. 

The Helmholtz equation is 


▽ 2 少 + k 2 ^r = 0, 

and its radial part is 

d 2 R 2dR_ 
dr 2 ^ r dr ^ 


(19.19) 


k 2 


Ki+ 


R = 0. 


(19.20) 


(This equation was discussed in Problems 14.26 and 14.35.) The solutions are 
spherical Bessel functions, genericaily denoted by the corresponding lower case 
letter as zi(x) and given by 




(19.21) 


19.4 SEPARATION IN SPHERICAL COORDINATES 543 


particle in a hard 
sphere 


where Z v {x) is a solution of the Bessel equation of order v. 

A general solution of (19.20) can therefore be written as 

Ri(r) = Aji(kr) + Byi(kr). 

If the origin is included in the region of interest, then we must set 5 
a case, the solution to the Helmholtz equation is 

oo l 

咖 kir, e,<p) = Y2 ^ 

1=0 TH ——1 


0. For such 


(19.22) 


The subscript k indicates that 於 is a solution of the Helmholtz equation with k 2 as 
its constant. 

19A1, Example, particle in a hard sphere 

The time-independent Schrodinger equation for a particle in a sphere of radius a is 

h 2 2, . 

——= E'i/r with the BC \jr{a^0,(p) = 0. Here E is the energy of the particle 

and jtt is its mass. We rewrite the Schrodinger equation as V 2 ^r + 2fiE/h 2 =： 0. With 
k 2 = 2fiE/h 2 , we can immediately write the radial solution 


i?/(r) = Aj[(kr) = Aji(y/2fA,Er/h). 


The vanishing of ^ at«implies that ji(^/2fj,E a/h) = 0, or 


^Ea 


h 


for k = 1 ， 2 , …， 


where X[ n is the nth zero of ji(x), which is the same as the zero of //+1/2( 叉） . Thus, the 
energy is quantized as 


Ei n 


h2x fn 

2fia^ 


for / = 0,1,, n = 1 ， 2, •… 


The general solution to the Schrodinger equation is 
^r(r, 0, <p)= EEE Klmh Y lm(P，Ph 


m 


«=1 / =0 tn- 


A particularly useful consequence of Equation (19.22) is the expansion of a 
plane wave in terms of spherical Bessel functions. It is easily verified that if k 
is a vector, with k • k = fe 2 , then e lk r is a solution of the Helmholtz equation. 
Thus, e lk r can be expanded as in Equation (19.22). Assuming that k is along the 
z-axis, we get k • r = kr cosO, which is independent of (p. Only the terms of 
Equation (19.22) for which m = 0 will survive in such a case, and we may write 
e ikr cose _ A\ji(kr)Pi (cos 0). To find A/, let m = cos 0, multiply both sides 
by P n (u), and integrate from —1 to 1: 


OO 


P n (u)e ikru du - Y, P n {u)Pi{u)du ^ A n j n (kr) 


2 


/ =o 


2w-hl 


544 19. STURM-L10UVILLE SYSTEMS: EXAMPLES 


expansion of e /k，r in 
spherical harmonics 


Thus 




2 / 2+1 

2 

2 w -{- 1 
2 


E 


PnWe 

(ikr) 
m\ 


ikru 


du 


P n (u)u m du. 


(19.23) 


IMs equality holds for all values of hr. In particular, both sides should give the 
same Tesultinthe limit of smaller. From the definition of (Jcr) and the expansion 
of 7 n (/:r), we obtain 


Jnikr) 


VS 7 /kr\ n 1 


kr ^0 2 \2/ r(n + 3/2) 


On the other hand, the first nonvanishing term of the RHS of Equation (19.23) 
occurs when m =n. Equating these terms on both sides, we get 




^ /kr\ n 2 2n+1 n\ — + 1 i n (kr) n 2 n+l (n\) 2 

~Y VT/ (2n + l)\y/n = nl (2n + 1)1 ’ 


(19.24) 


where we have used 


r 


^ 3\ 


(2n + 1)!V^ 
2 2w+1 «! 


and 


1 Pn(u)H n du = : +1 :;): 
-i (2n + 1)! 


Equation (19.24) yields A n ^ i n (2n + 1). 
With A n thus calculated, we can now write 


e 


ikr cos 9 


J2& + \)i l ji(kr)Pi(cosO). 


/ =0 


(19.25) 


For an arbitrary direction of k，k • r = Arr cos y, where y is the angle between k 
and r. Thus, we may write e lk r = ESo ⑶ + 1)" 力(无厂)户 K c o s 7 )， an d using the 
addition theorem for spherical harmonics, we finally obtain 

^' k r = J 亡 i l ji(kr)YL(G f t <p f )Yi m {0. <p\ (19.26) 

1=0 m=—l 

where 0 r and 〆 are the spherical angles of k and 9 and (p are those of r. Such 
a decomposition of plane waves into components with definite orbital angular 
momenta is extremely useful when working with scattering theory for waves and 
particles. 



19.5 PROBLEMS 545 



r=o x 


Figure 19.4 A semi-infinite heat-conducting plate. 


19.5 Problems 

19.1. Show that separated and periodic BCs are special cases of the equality in 
Equation (19.3). 

19.2. Derive Equation (19.4). 

19.3. A semi-infinite heat-conducting plate of width b is extended along the pos¬ 
itive jc-axis with one comer at (0,0) and the other at (0, b). The side of width b is 
held at temperature To, and the two long sides are held at T = 0 (see Figure 19.4). 
The two flat faces are insulated. Find the temperature variation of the plate, assum¬ 
ing equilibrium. Repeat the problem with the temperature of the short side held at 
each of the following: 





if 0<y< b/2, 
if b/2<y < b. 


(c)r 0 cos (^)， 0<y <b. 


⑻ 0<y<b. 

D 

⑷ 7b sin , ^<y <b. 


19A, Find a general solution for the electromagnetic wave propagation in a res¬ 
onant cavity, a rectangular box of sides 0 < x < a 9 0 < y < b, and 0 < z < d 
with perfectly conducting walls. Discuss the modes the cavity can accommodate. 

19.5. The lateral faces of a cube are grounded, and its top and bottom faces are 
held at potentials f\(x,y) and fjix, y), respectively. 

(a) Find a general expression for the potential inside the cube. 

(b) Find the potential if the topis held at Vo volts and the bottom at —Vq volts. 

19.6. Find the potential inside a semi-infinite cylindrical conductor, closed at the 
nearby end, whose cross section is a square; with sides of length a. All sides are 
grounded except the square side, which is held at the constant potential Vo- 



546 19. STURM-LIOUVULE SYSTEWIS: EXAMPLES 


19.7. Find the temperature distribution of a rectangular plate (see Figure 19.2) 
with sides of lengths a and b if three sides are held at 7 1 = 0 and the fourth side 
has a temperature variation given by 


以， 

a 


0 < x < a. 




a 


x 


a 

2 


0 < x < a. 


(b) ~^x(x - a), 0 <x <a. 

a 1 

(d) T — 0, 0 <x <a. 


19.8. Consider a thin heat-conducting bar of length b along the jc-axis with one 
end atx = 0 held at temperature To and the oth^r end atx = b held at temperature 
—Jo. The lateral surface of the bar is thermaiiy insulated. Find the temperature 
distribution at all times if initially it is given by 


(fit) 7(0,^:) - --r ■: c + r 0 ， where 0<x <b. 

b 

(b) 7(0, x) = -^-x 2 + T 0 , where 0<x <b. 

o L 

(c) T(0, x) - -^-x + To, where 0<x <b. 

b 

(d) 7(0, x) = Tocos ， where 0<x <b. 

Hint: The solution corresponding to the zero eigenvalue is essential and cannot be 
excluded. 

19.9. Determine T(x, y, Oforthe rectangular plate of Example 19.2.3 if initially 
the lower left quarter is held at To and the rest of the plate is held at T = 0. 

19.10. All sides of the plate of Example 19.2.3 are held at T = 0. Find the 
temperature distribution for all time if the initial temperature distribution is given 
by 


⑷ T(x,y,0)= 


To 

0 


if \a <x <^a and \b <y < \b, 
otherwise. 


⑻ T(x,y t 0) = —xy t 
ab 

(c) T(x,y,0) = —x, 
a 


where 0 <x < a and 0 < y < b. 
where 0 <x < a and 0 < y < b. 


19.11. Repeat Example 19.2.3 with the temperatures of the sides equal to T\, 

73, and Hint: You must include solutions corresponding to the zero eigenvalue. 

19.12. A string of length a is fixed at the left end, and the right end moves with 
displacement A sin cot. Find t) and a consistent set of initial conditions for 
the displacement and the velocity. 



19.5 PROBLEMS 547 


19.13. Find the equation for a vibrating rectangular membrane with sides of 
lengths a and b rigidly fastened on all sides. For a = b, show that a given mode 
frequency may have more than one solution. 

19.14. Repeat Example 19.3.1 if the can has semi-infinite length, the lateral sur¬ 
face is grounded, and: 

(a) the base is held at the potential V(p, (p). 

Specialize to the case where the potential of the base is given — in Cartesian co¬ 
ordinates—by 

(b) V = — y. (c) V = — x. (d) V = -^xy. 

a a a z 

Hint: Use the integral identity / z v+l J v (z) dz = z v ~^ 1 

19.15. Find the steady-state temperature distribution T(p, (p, z) in a semi-infinite 
solid cylinder of radius a if the temperature distribution of the base is f(p, (p) and 
the lateral surface is held at T = 0. 

19.16. Find the steady-state temperature distribution of a solid cylinder with a 
height and radius of 10, assuming that the base and the lateral surface are at T =0 
and the top is at T = 100. 

19.17. The circumference of a flat circular plate of radius a, lying in the xj-plane, 
is held at T = 0. Find the temperature distribution for all time if the temperature 
distribution at ? = 0 is given — in Cartesian coordinates — by 

⑷ —y- ⑻ —x. (c) ^xy. (d) T 0 . 

19*18. Find the temperature of a circular conducting plate of radius a at all points 
of its surface for all time t > 0, assuming that its edge is held at r = 0 and initially 
its surface from the center to a/2 is in contact with a heat bath of temperature To. 

19.19. Find the potential of a cylindrical conducting can of radius a and height h 
whose top is held at a constant potential Vo while the rest is grounded. 

19.20. Find the modes and the corresponding fields of a cylindrical resonant cavity 
of length L and radius a. Discuss the lowest TM mode. 

19.21. Two identical long conducting half-cylindrical shells (cross sections are 
half-circles) of radius a are glued together in such a way that they are insulated from 
one another. One half-cylinder is held at potential Vq and the other is grounded. 
Find the potential at any point inside the resulting cylinder. Hint: Separate Laplace’s 
equation in two dimensions. 



548 19. STURM-LIOUVILLE SYSTEMS: EXAMPLES 


19.22. A linear charge distribution of uniform density 入 extends along the z-axis 
from z - -b to z — b. Show that the electrostatic potential at any point r > bis 
given by 




00 

= 2 入 E 
jt=0 


(b/r) 2k+l 

2/:+「 


P2k(oos0). 


Hint: Consider a point on the z-axisata distance r >b from the origin. Solve the 
simple problem by integration and compare the result with the infinite series to 
obtain the unknown coefficients. 


19.23. The upper half of a heat-conducting sphere of radius ^ has T = 100; the 
lower half is maintained at T = —100. The whole sphere is inside an infinitely large 
mass of heat-conducting material. Find the steady-state temperature distribution 
inside and outside the sphere. 

19.24. Find the steady-state temperature distribution inside a sphere of radius a 
when the surface temperature is given by: 


⑷ 7b cos 2 0. (b) To cos 4 A (c) 7b| cos01. 

(d) 7b(cos 0 — cos 3 0). (e) 7o sin 2 (/) To sin 4 0. 

19.25. Find the electrostatic potential both inside and outside a conducting sphere 
of radius a when the sphere is maintained at a potential given by 


(a) Vo (cos - 3 sin 2 60. ⑻ Vb(5 cos 3 沒 一 3 sin 2 沒 ) ■ 


(c) 


Vq cos 0 for the upper hemisphere, 
0 for the lower hemisphere. 


19.26. Find the steady-state temperature distribution inside a solid hemisphere 
of radius a if the curved surface is held at 7o and the flat surface at T = 0_ 


Hint: Imagine completing the sphere and maintaining the lower hemisphere at a 
temperature such that the overall surface temperature distribution is an odd function 
about 6 = jt/2. 


19.27. Find the steady-state temperature distribution in a spherical shell of inner 
Tadius R\ and outer radius R 2 when the inner surface has a temperature T\ and the 
outer surface a temperature Ti, 


Additional Reading 

1. Jackson, J. Classical Electrodynamics, 2nd ed., Wiley, 1975. The classic 
textbook on electromagnetism with many examples and problems on the 
solutions of Laplace’s equation in different coordinate systems. 



19.5 PROBLEMS 549 


2. Mathews, J. and Walker, R. Mathematical Methods of Physics, 2nd ed., 
Benjamin, 1970. 

3. Morse, P. and Feshbach, M. Methods of Theoretical Physics, McGraw-Hill, 
1953. 





Part VI 


Green’s Functions 




20 _ 

Green，s Functions in One Dimension 


Our treatment of differential equations, with the exception of SOLDEs with con¬ 
stant coefficients, did not consider inhomogeneous equations. At this point, how¬ 
ever, we can put into use one of the most elegant pieces of machinery in higher 
mathematics, Green’s functions, to solve inhomogeneous differential equations. 

This chapter addresses Green’s functions in one dimension, that is, Green’s 
functions of ordinary differential equations. Consider the ODE Lj[m] = f{x) 
where is a linear differential operator. In the abstract Dirac notation this can be 
formally written as L |m) = |/>. If L has an inverse L^ 1 = G, the solution can be 
formally written as \u) = L— 1 |/> = G|/>. Multiplying this by {x\ and inserting 
■\ = f dy |y) uj(j) (y\ between G and |/) gives 

u{x) = J dyG(x ， y)w(y)f(y), (20.1) 

where the integration is over the range of definition of the functions involved. Once 
we know G(x, y), Equation (20.1) gives the solution w(^) in an integral form. But 
how do we find G(x, 

Sandwiching both sides of LG = 1 between {x\ and t^) and using 1 = 
/ dx f \x f ) w(x f ) {x f \ between L and G yields f dx r L{x, x f )w(x f )G(x\ y ) — 
(jt：| y) = 5(^: — y)/w(x) if we use Equation (6.3). In particular, if L is a local 
differential operator (see Section 16.1 )， then L(x,x f ) = [5(x — x f )/w(x)]L x , and 

differential equation we obtain 
for Green’s function 

L x G(x, y )= 你力 or L x G(x, y) = S(x - y), (20.2) 

w(x) 

where the second equation makes the frequently used assumption that w(x) = 1. 

Green's function G{x, y) is called the Green’s function (GF) for the differential operator (DO) L^. 



554 20. GREENS FUNCTIONS IN ONE DIMENSION 


As discussed in Chapters 16 and 18, L ^； might not be defined for all functions 
on R. Moreover, a complete specification of L x requires some initial (or boundary) 
conditions. Therefore, we expect G(x t y) to depend on such initial conditions as 
well. We note that when is applied to (20.1)，we get 

L x u(x) = J dy[L x (G(x 9 y)]w(y)f(y) = j <^ =(^^ (30/00 = f(x), 

indicating that u(x) is indeed a solution of the original ODE. Equation (20.2), 
involving the generalized function 5(x — y) (or distribution in the language of 
Chapter 6), is meaningful only in the same context. Thus, we treat G(x ， y) not as 
an ordinary function but as a distribution. Finally, (20.1) is assumed to hold for an 
arbitrary (well-behaved) function /. 

20.1 Calculation of Some Green’s Functions 

This section presents some examples of calculating G(x, y) for very simple DOs. 
Later we will see howto obtain Green’s functions fora general second-order linear 
differential operator. Although the complete specification of GFs requires bound¬ 
ary conditions, we shall introduce unspecified constants in some of the examples 
below, and calculate some indefinite GFs. 

20.1.1. Example. Let us find the GF for the simplest DO, L ^： — d/dx. We need to find 

a distribution such that its derivative is the Dirac delta function: 1 G f {x, )?) — — y). 

In Chapter 6, we encountered such a distribution — the step function 0(x — >)• Thus, 
G(x, y) = 9(x — y) -a(y) t where a(y) is the “constant” of integration. 園 

The example above did not include a boundary (or initial) condition. Let us 
see how boundaiy conditions affect the resulting GF. 

20.1.2. Example* Let us solve u f (x) — f(x) where x € [0, oo) and«(0) = 0. A general 
solution of this DE is given by Equation (20.1) and the preceding example: 

roo poo 

mO ：) = / 0(x - y)f(y)dy-\- / a(y)f(y)dy. 

JO Jo 

The factor 0(x — y) 'm the first term on the RHS chops off the integral at^: 

PX rOO 

«W= / fiy)dy+ / a(y)f(y)dy. 

Jo Jo 

The BC gives 0 = m(0) = 0-h a(y)f(y)dy. The only way that this can be satisfied for 
arbitrary f(y) is for a(y) to be zero. Thus, G(x, y) = 6(x — y), and 

fOO rX 

aW = / 0(x- y)f{y)dy^ / f{y)dy. 

JO Jo 

Here and elsewhere in this chapter, a prime over a GF indicates differentiation with respect to its first argument. 





20.1 CALCULATION OF SOME GREEN’S FUNCTIONS 


This is killing a fly with a sledgehammer! We could have obtained the result by a simple 
integration. However, the roundabout way outlined here illustrates some important features 
of GFs that will be discussed later. The BC introduced here is very special. What happens 
if it is changed to m(0) = al Problem 20.1 answers that. M 

20.1.3. Example* A more complicated DO is L x — d 2 /dx 2 .LQtus find its indefinite GF. 
To do so, we integrate G ,f (x, y) = 5(^: - y) once with respect to x to obtain y)= 

— 3 ?) + A second integration yields G(x, y) = f dxQ(x — ： y) + 加⑼ T}(y), 
where a and " are arbitrary functions and the integral is an indefinite integral to be evaluated 
next. 


Let }?) be the primitive of 沒 (x — ； y); that is, 


6(x — 3 ?) 


1 if x > y t 
0 if jc < y. 


(20.3) 


The solution to this equation is 




x + a(y) if x > y 9 

My) ifx <y. 


Note that we have not defined y)atx y. It will become clear below that ^2(x, y) is 
continuous a.tx = y. It is convenient to write y) as 


^(x t y) = [x + a(y)]$(x - y) + b(y)0(y - x). 


(20.4) 


To specify a{y) and b(y) further, we differentiate (20.4) and compare it with (20.3): 


—= 0(x - y) -[x fl(;y)]5U -y) - b(y)S(x - y) 
ax 

= 0(x — y) + U — b(y) -1- a(y)]8(x - y), (20.5) 

where we have used 

-y)- -x)= 8(x-y). 

ax ax 

For Equation (20.5) to agree with (20.3), we must have [x — b(y) + a(y)]5(x 一 ））= 0, 
which, upon integration over x，yields a(y) —b(y) == —y. Substituting this in the expression 
for j) gives 

^(x,y) = (x - y)9(x - y) + b{y)[0{x - y) + 0(y - x)]. 

But + 0(—x) = 1; therefore, Q(x, y) = (^ - - y) + b(y). It follows, among 

other things, that y) is continuous atx = y. We can now write 

G(x, y) = (x- y)9(x - y) + xa(y) + 

where fi(y) = rj(y) + b(y). ■ 

The GF in the example above has two arbitrary functions, a(y) and 办 (y) ， which 
are the result of underspecification of L/ A full specification of L ^： requires BCs, 
as the following example shows. 




556 20. GREENS FUWCTIONS IN ONE DIMENSION 


20.1.4. Example. Let us calculate the GF of L x [u] = u ff {x) = f(x) subject to the BC 
u(a) = u(b) =0 where [a, b] is the interval on which L x is defined. Example 20.1.3 gives 
us the (indefinite) GF for L：t. Using that，we can write 

Mx) = f (x - y)0(x - y)f(y) dy-\-x f ot(y)f{y)dy-\- f 

Ja Ja Ja 

= f (X - y)f(y) dy+x f a(y)f(y) dy+ f fi(y)f(y) dy. 

Ja Ja Ja 

Applying the BCs yields 

0 = u(a)=a f a(y)f(y) dy + f p(y)f(y) dy, 

Ja Ja 

0 = u(b) = f (b- y)f(y) dy + b f a(y)f(y) dy + f p(y)f(y)dy. 

Ja Ja Ja ( 20 . 6 ) 


From these two relations it is possible to determine Qf(y) and /3(y): Substitute for the 
last integral on the RHS of the second equation of (20.6) from the first equation and get 
0 = f^[b — y -\- ba(y) — aa(y)]f(y)dy. Since this must hold for arbitrary f(y), we 
conclude that 


b — y -\-{b — a)oi{y) =： 0 a(y)= 


b — y 
b — a 


Substituting for a(y) in the first equation of (20.6) and noting that the result holds for 
arbitrary /, we obtain p{y) = a{b—y)/{b-d). Insertion of a (y) and 卢 (y) in the expression 
for G(x, y) obtained in Example 20.1.3 gives 


G(x, y) = (x- y)0(x -y)-{-(x-a) 


y — b 
b — a 


where a <x and y <b. 


It is striking that G(a y y) ^ (a — y)6(a — 少） = 0 (because a — ^ < 0), and 


G(b ， y) = {b- y)0(b - y) -H (b - a) 


y —b 
b — a 


0 


because 6{b — y) = 1 for all y < b [recall that x and lie in the interval (a, 办)] .These 
two equations reveal the important fact that as a function of x f G(x t y) satisfies the same 
(homogeneous) BC as the solution of the DE. This is a general property that will be discussed 
later. 瞌 


In all the preceding examples, the BCs were very simple. Specifically, the value 
of the solution and/or its derivative at the boundary points was zero. What if the 
BCs are not so simple? In particular, how can we handle a case where u(a) [or 
u f {a)] and u(b) [or u f (b)] are nonzero? 

Consider a general (second-order) differential operator L^： and the differential 
equation L^[w] = f(x) subject to the BCs 以⑷ = 別 mid u(b) = We claim that 
we can reduce this system to the case where u(a) =■ u(b) = 0. Recall from Chapter 
13 that the most general solution to such a DE is of the form m = + where uh. 




20.2 FORMAL CONSIDERATIONS 557 


the solution to the homogeneous equation, satisfies L x [uh] = 0 and contains the 
arbitrary parameters inherent in solutions of differential equations. For instance, 
if the linearly independent solutions are v and w;, then uh (x) = 
and Ui is any solution of the inhomogeneous DR 

If we demand that (a) = a\ and Uh(b) = b\, then w/ satisfies the system 

= f (^), Ui{a) = Ui(b) ~ 0, 

which is of the type discussed in the preceding examples. Since is a SOLDO, we 
can put all the machinery of Chapter 13 to work to obtain v(x),w (x), and therefore 
Uk(x). The problem then reduces to a DE for which the BCs are homogeneous; 
that is，the value of the solution and/or its derivative is zero at the boundary points. 

20.1.5* Example* Let us assume thatL x = d^/dx 2 . Calculation of is trivial: 

d^uu 

L^[m^3=0 => ― ^ = 0 =>* Uh(x) = C\x + C 2 - 
dx L 

To evaluate C\ and C 2 , we impose the BCs Uh (a) = a\ and iih{b) — b\： 

C\a C 2 = m, 

Cjb + C 2 = b\. 

This gives C\ = (b\ — a\)/(jb — a) and C 2 = {a\b — ab\)/{b — a). 

The inhomogeneous equation defines a problem identical to that of Example 20.1.4. 
Thus, we can immediately write (x) = G(x, y)f(y) dy, where G(x, >?) is as given in 
that example. Thus, the general solution is 

u(x) - — G1 x + f (x- y)f(y) dy + ^~~ - f (y - b)f{y)dyM 

b — a b — a J a b — a J a 

Example 20.1.5 shows that an inhomogeneous DE with inhomogeneous BCs 
can be separated into two DEs，one homogeneous with inhomogeneous BCs and 
the other inhomogeneous with homogeneous BCs, the latter being appropriate for 
the GF. Furthermore, all the foregoing examples indicate that solutions of DEs can 
be succinctly written in terms of GFs that automatically incorporate the BCs as 
long as the BCs are homogeneous. Can a GF also give the solution to a DE with 
inhomogeneous BCs? 


20.2 Formal Considerations 

The discussion and examples of the preceding section hint at the power of Green’s 
functions. The elegance of such a function becomes apparent from the realization 
that it contains all the information about the solutions of a DE for any type of BCs, 
as we are about to show. Since GFs are inverses of DOs, let us briefly reexamine 
the inverse of an operator, which is closely tied to its spectrum. The question as 




558 20. GREEN'S FUNCTIONS IN ONE DIMENSION 


to whether or not an operator A in a finite-dimensional vector space is invertible 
is succinctly answered by the value of its determinant: A is invertible if and only 
if det A ^ 0. In fact, as we saw at the beginning of Chapter 16, one translates the 
abstract operator equation A |m) = \v) into a matrix equation Au = v and reduces 
the question to that of the inverse of a matrix. This matrix takes on an especially 
simple form when A is diagonal, that is, when A"= 又 / 5 小 For this special situation 
we have 

XiUi = Vf for i = 1,2,(no sum over 0- (20.7) 

This equation has a unique solution (for arbitrary Vi) if and only if 〜 尹 0 for all 
«. In that case ui = Vi/ki for i = 1,2,..., iV. In particular, if Wj = 0 for all 
that is, when Equation (20.7) is homogeneous, the unique solution is the trivial 
solution. On the other hand, when some of the \ are zero, there may be no solution 
to (20.7), but the homogeneous equation has a nontrivial solution (w/ need not be 
zero). Recalling (from Chapter 3) that an operator is invertible if and only if none 
of its eigenvalues is zero, we have the following: 

20.2.1. Proposition. The operator A e Z(V) is invertible if and only if the homo¬ 
geneous equation A \u) =0 has no nontrivial solutions. 

In infinite-dimensional (Hilbert) spaces there is no determinant. How can we 
tell whether or not an operator in a Hilbert space is invertible? The exploitation 
of the connection between invertibility and eigenvalues has led to Proposition 

20.2.1. which can be generalized to an operator acting on any vector space, finite 
or infinite. Consider the equation A|m) = 0 in a Hilbert space 5C. In general, 
neither the domain nor the range of A is the whole of JC. If A is invertible, then 
the only solution to the equation A|w) = 0 is \u) = 0. Conversely, assuming that 
the equation has no nontrivial solution implies that the null space of A consists of 
only the zero vector. Thus, 

A|«i)=A|i<2) A(|mi) - |M2» == 0 ^ |«i> — |«2> = 0. 

This shows that A is injective (one-to-one), i.e., A is a bijective linear mapping from 
the domain of A, D(A), onto the range of A. Therefore, A must have an inverse. 

The foregoing discussion can be expressed as follows. If A [m) = 0, then (by 
the definition of eigenvectors) A. = 0 is an eigenvalue of A if and only if |m) ^ 0. 
Thus, if AI w) = 0 has no nontrivial solution, then zero cannot be an eigenvalue of 
A. This can also be stated as follows: 

20.2.2. Theorem* An operator A on a Hilbert space has an inverse if and only if 
k — Q is not an eigenvalue of A, 

Green’s functions are inverses of differential operators. Therefore, it is impor¬ 
tant to have a clear understanding of the DOs. An nth-order linear differential op¬ 
erator (NOLDO) satisfies the following theorem (fora proof, see [Birk 78, Chapter 
6 ]). ^ 


20.2 FORMAL CONSIDERATIONS 559 


initial value problem 


20*2.3. Theorem, Let 


■ 






( 20 . 8 ) 


where p n {x) ^ 0 in [a, b]. Let xo € [a, b] and let {ykYl=\ given numbers 
and f(x) a given piecewise continuous function on [a 9 b]. Then the initial value 
problem (IVP) 


\-xM = f for x e [a,b], 

u(xo) = yi, u\xo) = 冷， … ， u {n ~ l) (xo) = y n (20.9) 


has one and only one solution. 

This is simply the existence and uniqueness theorem for a NOLDE. Equation 
(20.9) is referred to as the IVP with data {/(jc); yi,..., y n }. This theorem is used 
to define Lj ■ Part of that definition are the BCs that the solutions to Lj ； must satisfy. 

A particularly important BC is the homogeneous one in which yi = y 2 = 
• • ■ = = 0. In such a case it can be shown (see Problem 20.3) that the only 

nontrivial solution of the homogeneous DE L x [m] = 0 is m = 0. Theorem 20.2.2 
then tells us that L x is invertible; that is, there is a unique operator G such that 
LG = 1. The “components” version of this last relation is part of the content of the 
next theorem. 

20.2.4. Theorem. The DO Ljt of Equation (20.8) associated with the IVP with 
data 0,0,..., 0} is invertible; that is，there exists a function G{x, y) such 
that 


L x G(^x, j)= 


Hx-y) 

w(x) 


The importance of homogeneous BCs can now be appreciated. Theorem 20.2.4 
is the reason why we had to impose homogeneous BCs to obtain the GF in all the 
examples of the previous section. 

The BCs in (20.9) clearly are not the only ones that can be used. The most 
general linear BCs encountered in differential operator theory are 


Ri[m] = aiiM ⑷ + … + afinM( rt — ” ⑷ + p\\u{b) + … + ^\ n u {n ~ x \b) = yu 
R 2 [m] e a2tu(a) + ••• + ct2nU {n ~ X) {a) + p2iu(b) + ••• + ^2 n u {n ~ X) {b) - yz, 

: ( 20 . 10 ) 
R w [m] = otn\u{a) + . * • + a 仙 M (n_1) ⑷ + ^n\u(b) + ■ • • + An« (w_1) (^) = Yn- 

Then row vectors {(% i ，…，， ft i ， … ， Pin)Yi^\ are assumed to be independent 
(in particular, no row is identical to zero). We refer to R* as boundary functionals 
because for each (sufficiently smooth) function m, they give a number y 卜 The 




20. GREEN'S FUNCTIONS IN ONE D1MEMSIOM_ 

boundary functionals DO of (20.8) and the BCs of (20.10) together form a boundary value problem 
and boundary value (BVP). The DE L x [u] = f subject to the BCs of (20.10) is a BVP with data 
problem {/ W;n ，… ，从 

We note that the Rj are linear; that is, 

Ri [mi +«2]= 旰 [Ml] + R i[ M 2 ] and R r -[«M] = aHi [u]. 

Since L x is also linear, we conclude that the superposition principle applies to 
the system consisting of L^[m] = / and the BCs of (20.10)，which is sometimes 
denoted by (L; Ri ， … ， R n ). If u satisfies the BVP with data {/; yi,and 
v satisfies the BVP with data [g; /^i,..., then au + pv satisfies the BVP 

with data {af -h ^g\ ay\ 4 - PfMcry n It follows that if u and v both 

satisfy the BVP with data {/; 71 ,..., y»}, then u — v satisfies the BVP with data 
{ 0 ; 0 , 0 , •. • ， 0 }， which is called the completely homogeneous problem. 

Unlike the IVP, the BVP with data {0; 0,0, … ， 0} may have a nontrivial so¬ 
lution. If the completely homogeneous problem has no nontrivial solution, then 
the BVP with data {/； yi,.. •, Yn) has at most one solution (a solution exists for 
any set of data). On the other hand, if the completely homogeneous problem has 
nontrivial solutions, then the BVP with data {/； 71 ,..., y n } either has no solutions 
or has more than one solution (see [Stak 79, pp. 203-204]). 

Recall that when a differential (unbounded) operator acts in a Hilbert space, 
such as (a ，办 ), it acts only on its domain. In the context of the present discussion, 
this means that not all functions in b) satisfy the BCs necessary for defining 
Ljc. Thus, the functions for which the operator is defined (those that satisfy the 
BCs) form a subset of b) 9 which we called the domain of L ^； and denoted 
by D(Ljc). From a formed standpoint it is important to distinguish among maps 
that have different domains. For instance, the Hilbert-Schmidt integral operators, 
which are defined on a finite interval, are compact, while those defined on the 
entire real line are not. 

20.2.5* Definition, Let L x be the DO of Equation (20.8). Suppose there exists a 

adjoint of a DO lJ, with the property that 
differential operator * 

w {^(L^fw]) - w(l 4 [u])*} = v*] for u,v e DU n T»(L^), 

conjunct where Q[u, u*], called the conjunct of the functions u andv，depends on u, v, and 
their derivatives of order up to n — 1. The DO is then called the formal adjoint 

ofL x . If Lx = L x (without regard to the BCs imposed on their solutions)，then 
is said to be formally self-adjoint. 7f!D(L^) D T)(L^) and on 1 )(^), then 

L x is said to be hemiitian. "D(L^) = !D(L^) and = L^, then L ^： is said to be 
self-adjoint. 

The relation given in the definition above involving the conjunct is a general- 

generalized Green’s ization of the Lagrange identity and can also be written in integral form: 
identity 


completely 

homogeneous 

problem 


20.2 FORMAL CONSIDERATIONS 561 


dxw{v*{L x \u\)} - f dxw{u(ll[v])*} - Q[u f v *]| 念 


This form is sometimes called the generalized Green’s identity. 


( 20 . 11 ) 


George Green (17937-1841) was not appreciated in his lifetime. 

His date of birth is unknown (however, it is known that he was 
baptized on 14 July 1793), and no portrait of him survives. He left 
school, after only one year’s attendance, to work in his father’s bak¬ 
ery. When the father opened a windmill in Nottingham, the boy used 
an upper room as a study in which he taught himself physics and 
mathematics from library books. In 1828, when he was thirty-five 
years old, he published his most important work. An Essay on the 
Application of Mathematical Analysis to the Theory of Electricity 
and Magnetism at his own expense. In it Green apologized for any 
shortcomings in the paper due to his minimal formal education or 
the limited resources available to him, the latter being apparent in the few previous works he 
cited. The introduction explained the importance Green placed on the “potential” function. 
The body of the paper generalizes this idea to electricity and magnetism. 

In addition to the physics of electricity and magnetism. Green’s first paper also contained 
the monumental mathematical contributions for which he is now famous: The relationship 
between surface and volume integrals we now call Green's theorem, and the GreerCsjunc- 
tion, a ubiquitous solution to partial differential equations in almost every area of physics. 
With little appreciation for the future impact of this work, one of Green’s contemporaries 
declared the publication M a complete failure.” The “Essay”，which received little notice 
because of poor circulation, was saved by Lord Kelvin, who tracked it down in a German 
journal. 

When his father died in 1829, some of George’s friends urged him to seek a college 
education. After four years of self-study, during which he closed the gaps in his elementary 
education, Green was admitted to Caius College of Cambridge University at the age of 40, 
from which he graduated four years later after a disappointing performance on his final 
examinations. Later, however, he was appointed Perce Fellow of Caius College. Two years 
after his appointment he died, and his famous 1828 paper was republished, this time reaching 
a much wider audience. This paper has been described as “the beginning of mathematical 
physics in England.” 

He published only ten mathematical works. In 1833 he wrote three further papers. Two 
on electricity were published by the Cambridge Philosophical Society. One on hydrody¬ 
namics was published by the Royal Society of Edinburgh (of which he was a Fellow) in 
1836. He also had two papers on hydrodynamics (in particular wave motion in canals), two 
papers on reflection and refraction of light, and two papers on reflection and refraction of 
sound published in Cambridge. 

In 1923 the Green windmill was partially restored by a local businessman as a gesture of 
tribute to Green. Einstein came to pay homage. Then a fire in 1947 destroyed the renovations. 
Thirty years later the idea of a memorial was once again mooted, and sufficient money was 
raised to purchase the mill and present it to the sympathetic Nottingham City Council. In 
1980 the George Green Memorial Appeal was launched to secure *$20,000 to get the sails 








562 20. GREEW^ FUNCTIONS IN ONE DIMENSION 


turning again and the machinery working once more. Today, Green’s restored mill stands 
as a mathematics museum in Nottingham. 


20.2.1 Second_Order Linear DOs 

Since second-order linear differential operators (SOLDOs) are sufficiently general 
for most physical applications, we will concentrate on them. Because homogeneous 
BCs are important in constructing Green’s functions, let us first consider BCs of 
the form 

Rl[w] = anM ⑷ + ai2M ' ⑷ + ^hm(&) + P\2u\b) = 0, 

R2 [m] = 0 ?21W ⑷ + Qf22M’(a) + ⑻ + 月22 “’⑻ = 0 ， 

( 20 . 12 ) 

where it is assumed, as usual, that (an, ai 2 , fin, ^n) and (« 2 i> 仪 22 , 点 21 ， & 2 ) 
linearly independent. 

If we define the inner product as an integral with weight w. Equation (20.11) 
can be formally written as 

{v\L\u) = (u\LUvr + Q[u,v*]\l 

This would coincide with the usual definition of the adjoint if the surface term 
vanishes, that is, if 

Q[u, v*]\ x ^ b = Q[u,v*]\ x=a . (20.13) 

For this to happen，we need to impose BCs on v. To find these BCs, let us rewrite 
Equation (20.12) in a more compact form. Linear independence of the two row 
vectors of coefficients implies that the 2 x 4 matrix of coefficients has rank two. 
This means that the 2 x 4 matrix has an invertible 2x2 submatrix. By rearranging 
the terms in Equation (20.12) if necessary, we can assume that the second of the two 
2x2 submatrlces is invertible. The homogeneous BCs can then be conveniently 
written as 

RM = (=j) = (A B) O Au fl + Bu fr = 0 ， (20.14) 

where 

=) ， Bs d3 ， Ua O ’ Ub O 


and Bis invertible. 



20.2 FORMAL CONSIDERATIONS 563 


The most general form of the conjunct for a SOLDO is 

Q[u, v*](x) = q\\{x)u(x)v"(x) + qn{x)u{x)v r *{x) 

+ q 2 \Mu f {x)v*{x)-\- q 22 (x)u f (x)v f *(x), 

which can be written in matrix form as 

Q[u 9 v*]M - where Q, = ， (20.15) 

and u x and v* have similar definitions as u a and above. The vanishing of the 
surface term becomes 

uiQbV* = u^Qawl ， (20.16) 

We need to translate this equation into a condition on u* alone? This is accom¬ 
plished by solving for two of the four quantities u{a), u\a), u(b), and u f (b) in 
ternis of the other two, substituting the result in Equation (20.16)，and setting the 
coefficients of the other two equal to zero. Let us assume, as before, that the sub¬ 
matrix B is invertible, i.e., u(b) and u\b) are expressible in terms of u(a) and 
u\a). Then = —B _ 1 Au a , or u ( b = —u^A f (B f ) _1 ，and we obtain 

-(BO- 1 。 〆 =4 [A ⑽ r 1 。 办 v| + Q a vl] - 0 , 

and the condition on v* becomes 

A f + Q fl v* = 0. (20.17) 

We see that all factors of u have disappeared, as they should. The expanded version 
of the BCs on i;* are written as 

Bi[v*] = a n v*(a) + 疗 ⑽’* ⑷ + 仍 〆 ⑻ + m 2 V f *(b) = 0 , 

^2[V*] ^ 巧1 ?；» + <722〆 * ⑷ + ⑻ + 似 2 l/* ⑻ = 0. 

(20.18) 

adjoint boundary These homogeneous BCs are said to be adjoint to those of (20.12). Because of the 
conditions difference between BCs and their adjoints, the domain of a differential operator 
need not be the same as that of its adjoint. 

20.2.6. Example. Let = d 2, jdx 2, with the homogeneous BCs 

Rl[w] = Qfw(a) — u f (a) = 0 and R 2 [m] = Pu(b) — u\b) = 0. (20.19) 

We want to calculate Q[u, u*] and the adjoint BCs for v. By repeated integration by parts 
[or by using Equation (13.23)], we obtain Q[u t u*] = u f v* —uv , *. For the surface term to 
vanish, we must have 

z / ⑷ — m ⑷ i/*(a) = u\b)v*{b) — u(b)v f *(b). 

2 The boundary conditions on v* should not depend on the choice of u. 


564 20. GREEN'S FUNCTIONS IN ONE DIMENSION 


Substituting from (20.19) in this equation, we get 
u(a)[av*{a) — v f *(a)] = u(b)[Pv*(b) — i/* ⑻]， 
which holds for arbitrary u if and only if 

= av*(a) - i/* ⑷ = 0 and B2[v*] = fiv*(b) - i/* ⑻ = 0. 

( 20 . 20 ) 

This is a special case, in which the adjoint BCs are the same as the original BCs (substitute 
u for v* to see this). 

To see that the original BCs and their adjoints need not be the same, we consider 
Rl[u] - i /⑷- au(b) =0 and R 2 M = fiu ⑻— u\b) = 0, (20.21) 

from which we obtain u(a)[pv*(b) + v f *(a)] = u{b)[otv*(a) ■{- v f *(b)]. Thus, 

Bilv*] = au * ⑷ + v f *(b) - 0 and B 2 [v*] = ⑻ + v f *(a) = 0, 

( 20 . 22 ) 

mixed and unmixed which is not the same as (20.21). Boundary conditions such as those in (20.19) and (20.20), 
BCs in which each equation contains the function and its derivative evaluated at the same point, 
are called unmixed BCs. On the other hand, (20.21) and (20.22) are mixed BCs. M 


20 . 2.2 Self-Adjoint SOLDOs 

In Chapter 13, we showed that a SOLDO satisfies the generalized Green’s identity 
with u;(jc) = 1. In fact, since u and v are real, Equation (13.24) is identical to 
(20.11) if we set w; = 1 and 

Q[u t u] = p 2 vu’ 一 (p 2 v) f u + p\uv. (20.23) 


Also, we have seen that any SOLDO can be made (formally) self-adjoint. Thus, 
let us consider the formally self-adjoint SOLDO 




where both p{x) and q(x) are real functions and the inner product is defined with 
weight w = 1. If we are interested in formally self-adjoint operators with respect 
to a general weight uj > 0, we can construct them as follows. We fiist note that if 
L x is formally self-adjoint with respect to a weight of unity, then (l/w)L x is self- 
adjoint with respect to weight w. Next, we note that is formally self-adjoint for 
all functions q, in particular, for wq. Now we define 


4 W) 


±( P ±) 

dx V dx/ 


-\-qw 


and note that is formally self-adjoint with respect to a weight of unity, and 
therefore 






(20.24) 


20.3 GREEN'S FUNCTIONS FOR SOLDOS 565 


is formally self-adjoint with respect to weight w(x) >0. 

For SOLDOs that are formally self-adjoint with respect to weight w, the con¬ 
junct given in (20.23) reduces to 

Q[u, v] = p(x)w(x)(vu / - uv f ). (20.25) 

Thus, the surface term in the generalized Green’s identity vanishes if and only if 

p(b)w(b)[v(b)u’(b) — u{b)v\b)] = /7(0)«^)[>( 咖 ％) — m ⑷ t/(a)], 

(20.26) 

common types of The DO becomes self-adjoint if m and v satisfy Equation (20.26) as well as the 
boundary conditions same It can easily be shown that the following four types of BCs on u(x) 
for a SOLDE assure the validity of Equation (20.26) and therefore define a self-adjoint operator 
L x given by (20.24): 

1. The Dirichlet BCs: u{a) = u(b) — 0 

2. The Neumann BCs: u f (a) = u\b) = 0 

3. General unmixed BCs: au(a) - u f {a) = Pu{b) - u\b) — 0 

4. Periodic BCs: u(a) = u(b) and u f {a ) : = u f (b) 


20.3 Green’s Functions for SOLDOs 


We are now in a position to find the Green’s function for a SOLDO. First, note 
that a complete specification of requires not only knowledge of po(x), p\(x), 
and p 2 (x ) — its coefficient functions — but also knowledge of the BCs imposed 
on the solutions. The most general BCs for a SOLDO are of the type given in 
Equation (20.10) with w = 2. Thus, to specify uniquely, we consider the system 
(L; Ri ， F? 2 ) with data (/; y\, yi)^ This system defines a unique BVP: 


(fiu du 


氏 [ M ] ~ Yi > i = 1,2, 
(20.27) 


A necessary condition for L x to be invertible is that the homogeneous DE 
L x [u] = 0 have only the trivial solution m = 0. For m = 0 to be the only solution, 
it must be a solution. This means that it must meet all the conditions in Equation 
(20.27). In particular, since R* are linear functionals of m, we must have R/[0] = 0. 
This can be stated as follows: 


20.3.1. Lemma. 4 necessary condition for a second-order linear DO to be invert¬ 
ible is for its associated BCs to be homogeneous? 


3 The lemma applies to all linear DOs, not just second order ones. 


566 20. GREEN’S FUNCTIONS IN ONE DIMENSION 


Thus, to study Green’s functions we must restrict ourselves to problems with 
homogeneous BCs. This at first may seem restrictive, since not all problems have 
homogeneous BCs. Can we solve the others by the Green’s function method? The 
answer is yes, as will be shown later in this chapter. 

The above discussion clearly indicates that the Green’s function of L x , being its 
“inverse，” is defined only if we consider the system (L; Ri ， R 2 ) with data (/ ; 0, 0). 
If the Green’s function exists, it must satisfy the DE of Theorem 20.2.4, in which 
L t acts on G(x, y). But part of the definition of L x are the BCs imposed on the 
solutions. Thus, if the LHS of the DE is to make any sense, G(x, y) must also 
satisfy those same BCs. We therefore make the following definition: 

formal definition of 20.3*2, Definition. The Green’s function of a DO L x is a function G(x, y) that 
Green’s function satisfies both the DE 


L x G{x,y )= 


8(x-y) 

w(x) 


and’ as a function of x, the homogeneous BCs Rf[G] = Ofor i = 1,2 where the 
R ； are defined as in Equation (20.12). 


It is convenient to study the Green’s function for the adjoint of L x simultane¬ 
ously. Denoting this by g(x, y) 9 we have 


L\g(x, y) = Bi[g] = 0 ， for i = 1,2, (20.28) 

where are the boundary functionals adjoint to R/ and given in Equation (20.18). 
adjoint Green’s The function g(x, y) is known as the adjoint Green’s function associated with 
function the DE of (20.27). 

We can now use (20.27) and (20.28) to find the solutions to 


U[m] = Hi[u] = 0 for i — 1,2, 

Lj[u] = h{x), = 0 for i = 1,2. (20.29) 

With v(x) = ^(x,^)inEquation (20.11) — whose RHS is assumed to be zero — we 
get wg*(x, y)L x [u]dx = wu(x)(Lxlg])*dx. Using (20.28) on the RHS and 
(20.29) on the LHS, we obtain 


u(y) 


g*(x,y)w(x)f(x)dx. 


a 


Similarly, with u{x) = G(x, y). Equation (20.11) gives 

rb 


v*(y) 




G(x, y)w(x)h*{x) dx, 



20.3 GREENS FUNCTIONS FOR SOLDOS 567 


or, since w{x) is a (positive) real function, 

G*(x, y)w(x)h(x) dx. 

These equations for m(j) and u(^) are not what we expect [see, for instance. 
Equation (20.1)]. However, if we take into account certain properties of Green’s 
functions that we will discuss next，these equations become plausible. 



Green’s identity 


20.3.1 Properties of Green’s functions 

Let us rewrite the generalized Green’s identity [Equation (20.11)]，with the RHS 
equal to zero, as 



dtw(t){v*(t)(L t [u ])}= 





(20.30) 


This is sometimes called Green’s identity. Substituting G(t,y) for u(t) and g(t,x) 
for u(0 gives 


dtw(t)g*(t,x) 8 ^~ y) 




w(t) J a ._ 

or g*(y,x) = G(x,y). A consequence of this identity is that 


20.3.3. Box. G(x, j) must satisfy the adjoint BCs with respect to its second 
argument. 


If for the time being we assume that the Green’s function associated with a 
system (L; Ri，R 2 ) is unique, then, since for a self-adjoint differential operator, 
L 工 and are identical and u and v both satisfy the same BCs, we must have 

y) = g(x, y) or, using g*(y,x) = G(x, y), we get G(x, y) = G*(y,x).h\ 
particular, if the coefficient functions of L^； are all real, G(x, y) will be real, and 
we have G(x, y) = This means that G is a symmetric function of its two 

arguments. 

The last property is related to the continuity of G(x, y) and its derivative at 
x = y. For a SOLDO, we have 

^2^ QG S (x \ 

L x G(x,y) = p 2 (x)—r- + p\{x)— + po(x)G = —— 

where po,p\, and p 2 are assumed to be real and continuous in the interval [a, b\ 
and w(a:) and P 2 (x) are assumed to be positive for all jc e [a, b]. We multiply 
both sides of the DE by 


h(x )= 


M ⑺ 
Pi(x) 


where 


fi(x) = exp 







568 20. GREEN’S FUNCTIONS IN ONE DIMENSION 


noting that d\xjdx =This transforms the DE into 

po(x)fi(x)^ y 、 fi(y) 


y (x) ^ G ^ y \ 


+ 


9^ L r 、 — ' "」 P2 (x) 

Integrating this equation gives 


G(x,y) 


P2(y)^(y) 


- y). 


Mx ) i G ( x ， y ) H ^ G(t ，神 


―— 0(x - y) + 吻） 
P2( 加 00 (20.31) 


because the primitive of 6(^: — y) is 0(x — y). Here or(y) is the “constant，’ of 
integration. First consider the case where po = 0, for which the Green’s function 
will be denoted by Go(^? y)- Then Equation (20.31) becomes 


/ ， (x)-Go(x,j) 


MOO 


P2(y)^(y) 


0(x-y)-^a\(y), 


which (since 弘， P2, and uj are continuous on [a, b], and 0(^—j) has a discontinuity 
only ditx = y) indicates that dGo/dx is continuous everywhere on [a, b] except 
atx = y. Now divide the last equation by and integrate the result to get 


Go(x，：y) 


从 00 




P2(y)^(y) 


i^if) 




dt-\-oi\{y) 


dt 

M(0 


+ «2(y)- 


Every term on the RHS is continuous except possibly the integral involving the 
0-function. However, that integral can be written as 




9(t — }) 

〆/) 


'X 


dt = 6(x — y) 


dt 


(20.32) 


The 0-function in front of the integral is needed to ensure that a < y < x 
demanded by the LHS of Equation (20.32). The RHS of Equation (20.32) is con¬ 
tinuous atx = y with limit being zero as x y. 

Next, we write G(x,y) = Go(x, y) + H(x, y), and apply L x to both sides. 
This gives 

^f = { p2 & + o …。 g 。+ ㈣ 

=+ P0 G 0 + y), 

w(x) 

or p%H” + p\H l H- poH = — p 0 Go. The continuity of Go, po, pi, and p 2 on 
[a,b] implies the continuity of H, because a discontinuity in H would entail a 
delta function discontinuity in dH/dx, which is impossible because there are no 
delta functions in the equation for H, Since both Go and H are continuous, G 
must also be continuous on [a, b]. 



20.3 GREEN’S FUNCTIONS FOR S0LD0S 569 


existence and 
uniqueness of GF for 
a second order linear 
differential operator 


* 

We can now calculate the jump in dG/dx atx = y. We denote the jump as 
AG\y) and define it as follows: 


AG ，()0 




[3G 
- dx 


(x,y) 


dG 


(x,y) 


x=y—€- 


lx=_y+e dx 

Dividing (20.31) by /x(x) and taking the above limit for all terms, we obtain 

ty+€ po(Om(0 


+ lim 

e-^0 


，(: y + 在 ） J 

1 

- 


Plit) 
' y ~ € Po(t)f^(t) 
I P2(t) 


G(t, y) dt 
0(t,y) dt 


Ky) 


P2(y)w(y) 


0(+€) 


-o 

0(-€) 




The second term on the LHS is zero because all functions are continuous at y. The 
limit on the RHS is simply 1 We therefore obtain 


AG，00 


P 2 (y)w(y) 


(20.33) 


20.3.2 Construction and Uniqueness of Green’s Functions 

We are now in a position to calculate the Green’s function for a general SOLDO 
and show that it is unique. 

20.3.4. Theorem. Consider the system (L; Ri, R 2 ) with data (/; 0,0), in which 
Ljc is a SOLDO. If the homogeneous DE L x [m] = 0 has no nontrivial solution, then 
the GF associated with the given system exists and is unique. The solution of the 
system is 


u{x )= 


f 


dyw(y)G(x,y)f(y) 


and is also unique. 


Proof. TheGF satisfies the DE L x G(x, 3 ^) = 0 for all x g [a, b] except x = y. 
We thus divide [a, b] into two intervals, 1\ = [a, y) and h = (y, b] 9 and note 
that a general solution to the above homogeneous DE can be written as a linear 
combination of a basis of solutions, u\ and 112 - Thus, we can write the solution of 
the DE as 


Gi(x, y) = c\u\(x) + c 2 U 2 {x) 
G r (x, y) = d\u\{x) + d 2 U 2 ipc) 


for x e I\ 
for x e h 


570 20. GREEN J S FUNCTIONS IN ONE DIMENSION 


and define the GF as 


G(x t y) = 


Gi(x,y) 

G r (x,y) 


if x e I\, 
if x e 12 , 


(20.34) 


where C\,C 2 , d\, and 办 are, in general, functions of 3 ^. To determine G(x, y) we 
must determine four unknowns. We also have four relations: the continuity of G, 
thejumpinaG/3jc atx - y, and the two BCs Ri [G] = R 2 [G] = 0. The continuity 
of G gives 


ciOOmiOO + C2(y)u2(y) = diOO M i(y) + ^2 (>)^ 2 (^)* 

The jump of 3G/Bx at a ： = y yields 

ci(y)u\(y) + c 2 (y)u f 2 (y) - d Y (y)u[(y) - d 2 (y)u / 2 (y)= - 油 ) 匕 ( 力 - 

Introducing b\^c\- d\ and fe 2 = ^2 - di changes the two preceding equations 
to ■ 


b\U\ -|- b2U2 = 0 , 


b\u[ + & 2 m 2 


P2 说 


These equations have a unique solution iff 


But the determinant is simply the Wronskian of the two independent solutions and 
therefore cannot be zero. Thus, b\ (j) and 办 2 (>0 啦 determined in terms ofu\,u[, 
« 2 , » 2 * and w. 

We now define 


Kx, y )= 


b\(y)ui(x)b 2 (y)u 2 (x) 
0 


if x E / 1 , 
]f x e I 2 - 


so that GU, y) — h(x, y)-\-d\ (y)u\ (x)-\-d 2 (y)u 2 (x). We have reduced the number 
of unknowns to two, d\ and Imposing the BCs gives two more relations: 

Ri [G] = Ri[A] -\-diR\[uii + 由闩1[«2] = 0 , 

R 2 [G] = R2M + ^iR2[«l] + 办 R2O2] = 0. 

Can we solve these equations and determine d\ and uniquely? We can, if 

d o . 


20.3 GREENS FUNCTIONS FOR SOLDOS 571 


It can be shown that this determinant is nonzero (see Problem 20.5). 

Having found the unique {bi,di}^_ v we can calculate c/ uniquely, substitute 
all of them in Equation (20.34)，and obtain the unique G{x y y). Tliat u(x) is also 
unique can be shown similarly. □ 

20.3.5. Example. Let us calculate the GF for L% = (fi/dx^ with BCs u{a) = u{b) = 0. 
We note that L x [m] = 0 with the given BCs has no nontrivial solution (verify this). Thus ， 
the GF exists. The DE for G(x, y) is G ,f = 0 for x ^ y, whose solutions are 


G{Xiy) = \^^c 2 ifa<x< yi 
d\x +^2 if y < x < b. 


(20.35) 


Continuity a.tx = y gives c\y + C2 = d\y + d，2 or b\y + 办 2 = 0 with & = c/ — 冷 . The 
discontinuity of dG/dx a.tx = y gives 

J 1 

a\ — c\ = - = 1 b\ = — 1 

P 2 祕 

assuming that uj = 1. From the equations above we also get = 3 ^. G{x, y) must also 
satisfy the given BCs. Thus, G(a y y) = 0 = G(b, y). Since a < y and b > y 9 v/c obtain 
c\a +C 2 = 0 and d\b + 办 = 0 , or, after substituting c/ = bi -H di, 

o.d\ d2 = a — y, bd\ + 办 = 0 . 

The solution to these equations is d\ = (y - d)/(b - a) and d，2 = -b(y - a)/(b — a). 
With b\,b2, d\, and ^2 as given above，we find 

b — y b — y 

c\=b\-\-d\ = —- - and q = 办 2 + 办 =a *； - . 

b — a b — a 

Writing Equation (20.35) as 


G(x t y) = (cix+ c 2 )0(y - x) + {d\x-\- d 2 )0(x - y) 
and using the identity 0(y —x) = l— 0(x — y), we get 
G(x, y) = c\x + C 2 — {b\x + b^)6{x - y). 

Using the values found for the b's and c’s，we obtain 

G(x, >0 = (a - 文 )— + (j - y)0(x — y), 

which is the same as the GF obtained in Example 20.1.4. M 

20.3.6. Example. Let us find the GF for = d 2 /dx 2 -\-l with the BCs «(0) = m(jt/2)= 

0. The general solution of L^[«] = Ois 

u{x) = A sin^: + B cosx. 


If the BCs are imposed, we get m = 0. Thus, G(x, 3 ») exists. The general form of G(x ， 少 ） is 
^, 、 \c\ sinx + C 2 cos a: if 0 < jc < y. 

G(x, }；)=： ' f - (20.36) 

I <2i sin x +^2 cos a; if y <x < tt/ 2 . 


20. GREEN’S FUNCTIONS IN ONE DIMENSION 


Continuity of G at x = y gives b\ siny + /? 2 cos;y = 0 with b t = C{ - d it The 
discontinuity of the derivative of G at x = y gives b\ cos y — b 2 sin y = —1, where we 
have set u;(^) = 1. Solving these equations yields b\ = - cos : y and 办 2 = sin The BCs 
give 

G(0, y)-0 ^ c 2 = 0 d 2 - -b 2 - - sin}/, 

G(tt/2, y)=0 di=0 ^ ci ^-h =-cosy. 

Substituting in Equation (20.36) gives 


Oix, y)= 


— cosy sin a: 
—sin ;y cos x 


ifx<y, 
if 3 ； < x, 


or, using the theta function, 

G{x,y) = -6(y - x)cosy smx - 0(y - x) sin 3 ? cos 

r= - [1 -0(x- y)] cos ysinx -G(x -y) siny cosx 
=—cos y sinx + 9(x — y) sin(j ： — y). 


It is instructive to verify directly that G(x, y) satisfies Lj ： [G] = — )0: 


LJG] = 


— cosy 



+ 1 I sinxH- 


=0 




[0(x - y) sm(x - y)] 


^2 

二 ~^\0{x - y) sm(x - y)] - y) sinh — y) 

= — [8(x-y) sin(jc - >；) ^0(x - y) cos(x - y)] +0(x - y) sm(x - y). 

dx ' - - - " 

=0 


The first term vanishes because the sine vanishes at the only point where the delta function 
is nonzero. Thus, we have 


L X [G] = [8(x-y) cos(jc - y) - 0(x - y) sm(x - y)] + 0(x - y) sin(^r - y) 
= Hx-y) 

because the delta function demands that x = y f for which cos(x — y) = l. 


The existence and uniqueness of the Green^ function G(x, y) in conjunction 
with its properties and its adjoint, imply the existence and uniqueness of the adjoint 
Green’s function g(x, y). Using this fact, we can show that the condition for the 
absence of a nontrivial solution for L x [m] = 0 is also a necessary condition for 
the existence of G(x,y). That is, if G(x,y) exists, then L x [u] = 0 implies that 
m = 0. Suppose G(x, y) exists; then g(x, y) also exists. In Green’s identity let 
v = g(x, y). This gives an identity: 



w(x)g*(x,y)(L x [u]) dx 



W(JC)M(X)(l4[g])* A 



5 (jc ) 

w{x)u(x) - -——dx = 



20.3 GREEM’S FUNCTIONS FOR SQLDOS 573 


In particular, if Lj ； [m] = 0, then u(y) = 0 for all y. We have proved the following 
result. 


20.3.7. Proposition. TheDE L^[m] = 0 implies that u = 0 if and only if the GF 
corresponding to and the homogeneous BCs exist. 

It is sometimes stated that the Green’s function of a SOLDO with constant 
coefficients depends on the difference x — y. This statement is motivated by the 
observation that if u(x) is a solution of 


d 2 u du 

^-x[u\ = ci2—^ -f ai— +flow 
ax 1 ax 


fix). 


then u(x - y) is the solution of a 2 U" + a\u f - a^u — f(x - y) if^o, and 处 
are constant. Thus, if G(x) is a solution of = 占 (jc) [again assuming that 
wj(x) = 1 ], then it seems that the solution of Lj；[G] = 8(x-y) is simply G(:c —; y). 
This is clearly wrong, as Examples 20.3.5 and 20.3.6 showed. The reason is, of 
course, the BCs. The fact that G(x — y) satisfies the right DE does not guarantee 
that it also satisfies the right BCs. The following example, however, shows that the 
conjecture is true for a homogeneous initial value problem. 

20.3.8* Example. The most general form for the GF is 


G(x, y )= 


C\u\{x)C 2 U 2 {x) 
d\u\{x) + d 2 u 2 {x) 


if o <^: < 3；, 
if y < x < b. 


The I VP condition G(a, y) = 0 = G r (a, >») implies 

c\u\(a) + c 2 U 2 (a) = 0 and c\u f ^{a) + ⑷: = 0_ 

Linear independence of u\ and M 2 implies 

d6t (»；{«) :淡 

Hence, c\ = C 2 = 0 is the only solution. This gives 


G(x, y) 


0 if ^ 

d\u\(x) + 办 《200 if y < x < b. 


(20.37) 


Continuity of G at a : =： y yields d\u\(y) + ^2 M 2(>) = while the discontinuity jump 
condition in the derivative gives d\u f ^{y) + dqu^iy) = 1. Solving these two equations, we 
get ^ 


d ^_« 2 (y) 

1 »i(3 ? )«2W - ^(y^iiy) 3 

Substituting this in (20.37) gives 


办 =_ «i(y) _ 

" - ^ 2 (y) u x(<y) 


G(x, y) 


U 2 {y)u\{x) - ui{y)u 2 {x) 
u f \iy)u 2 {y) - u f 2 (y)u\(y) 


0(x - y y 


(20.38) 


574 20. GREEN’S FUNCTIONS IN ONE DIMENSION 


GF solves 
inhomogeneous BCs 
as well 


Equation (20.38) holds for any SOLDO with the given BCs. We now use the fact that the 
SOLDO has constant coefficients. In that case, we know the exact form of«i andw 2 . There 
are two cases to consider: 

1. If the characteristic polynomial of L ^； has two distinct roots and 入 2 , then u \ (^:) = 
杉入 0 and u 2 (x) = e k2X . Writing 入 1 = a + /? and 入 2 = a — 办 and substituting the 
exponential functions and their derivatives in Equation (20.38) yields 


G(x,y) 


e {a-b)y e {a-\-b)x _ e (a-\-b)y e (a-b)x 


0{x-y) 


2b 


2be 2a y 

Y e ia+b){x-y) _ - y) f 


which is a function ofx — y alone. 

2. IfXi = 入 2 二入 ， then ui(x) = e Xx , ui(x) = xe Xx , and substitution of these 
functions in Equation (20.38) gives 

G(x, y) = (x — y)e k(<x ~ y ^0(x - y). _ 


20.3.3 Inhomogeneous BCs 

So far we have concentrated on problems with homogeneous BCs, Rf [m] = 0, 
for i = 1,2. What if the BCs are inhomogeneous? It turns out that the Green’s 
function method, even though it was derived for homogeneous BCs, solves this 
kind of problem as well! The secret of this success is the generalized Green’s 
identity. 

Suppose we are interested in solving the DE 
L^[m] = f(x) with R/[w] = Yi for i = 1,2, 


and we have the GF for L x (with homogeneous BCs, of course). We can substitute 
v = g(x, y) = G^(y f x) in the generalized Green’s identity and use the DE to 
obtain 

f w(x)G(y, x)f(x) dx - f w{x)u{x){L\[g]f dx = Q[u f g*(x, y)]\xZa* 
Ja Ja 

or, using = 8(x- y)/w(y), 

Hy) = f w(x)G(y, x)f(x)dx - Q[u z g*(x, y)]\ x x ^ 

Ja 

To evaluate the surface term, let us write the BCs in matrix form [see Equation 

(20.17)]: 

Au fl + Bl^ = ) / => Ufe = B -1 y — B _1 Au fl , 

AG« + BG* = 0 ^ A^BT^^ + QaQ^O, 



20.3 GREEN’S FUNCTIONS FOR SOLDOS 575 


where y is a column vector composed of y\ and yi, and we have assumed that 
G(x, y) and g*(x, y) satisfy, respectively, the homogeneous BCs (with y = 0) 
and their adjoints. We have also assumed that the 2x4 matrix of coefficients has 
rank 2, and without loss of generality, let B be the invertible 2x2 submatrix. Then, 
assuming the general form of the surface term as in Equation (20.15), we obtain 

G [«， 作， >0 ㈣ = 咗喊 -机 g : 

- (B- 1 / — Q^- u^Q fl g* 

= [A^B^Obgt + Q fl g*] 

V v * 

= 0 because satisfies 
homogeneous adjoint BC 

= y t (B t r 1 ObQt (20.39) 

where 

, = / g*(^, y) V / G(y, b) \ 

8b ^\^S*(x,y)\ x=b )^ \^- G (y,x)\ x= : b J- 

It follows that Q[u, g*(x, is given entirely in terms of G, its derivative, the 

coefficient functions of the DE Ridden in the matrix Q), the homogeneous BCs 
(hidden in B), and the constants y\ and The fact that g* and dg*/dx appear 
to be evaluated at jc = 办 is due to the simplifying (but harmless) assumption that 
B is invertible, i.e., that u(b) and u\b) can be written in terms of u(a) and u\a). 
Of course, this may not be possible; then we have to find another pair of the four 
quantities in terms of the other two, in which case the matrices and the vectors 
will change but the argument, as well as the conclusion, will remain valid. We can 
now write 

u(y)= f w(x)G(y, x)f(x) dx - y^g*, (20.40) 

Ja 

where a general matrix M has been introduced, and the subscript/? has been removed 
to encompass cases where submatrices other than B are invertible. Equation (20.40) 
shows that u can be determined completely once we know G(x, y), even though 
the BCs are inhomogeneous. In practice, there is no need to calculate M. We can 
use the expression for Q[u, g*] obtained from the Lagrange identity of Chapter 
13 and evaluate it at b and a. This, in genera], involves evaluating u and G and 
their derivatives at a and b. We know how to handle the evaluation of G because 
we can actually construct it (if it exists). We next find two of the four quantities 
corresponding to m in terms of the other two and insert the result in the expression 
for Q[u, ^*]. Equation (20.39) then guarantees that the coefficients of the other two 
terms will be zero. Thus, we can simply drop all the terms in Q[u 9 g*] containing 
a factor of the other two terms. 


576 20. GREEN'S FUNCTIONS IN ONE DIMENSION 


Specifically, we use the conjunct for a formally self-adjoint SOLDO [see Equa¬ 
tion (20.26)] and g*(x, y) == G^, a ：) to obtain 


u{y) 


w(x)G(y,x)f(x)dx 

du 


— — *) 芸 0^)]} 挪 


Interchanging x and 3 ; gives 

u{x) = f w(y)G(x,y)f(y) dy 


Ja 

+ [p(y)My) y) - °( x ^ 


(20.41) 


This equation is valid only for a self-adjoint SOLDO. That is, using it requires 
casting the SOLDO into a self-adjoint form (a process that is always possible, in 
light of Theorem 13.5.4). 

By setting f(x) = 0, we can also obtain the solution to a homogeneous DE 
L x [m] = 0 that satisfies the inhomogeneous BCs. 

20.3«9. Example. Let us find the solution of the simple DE d^ujdx} = f(x) subject to 
the simple inhomogeneous BCs u(a) — y\ and u(b) = >* 2 . The GF for this problem has 
been calculated in Examples 20.1.5 and 20.3.5. Let us begin by calculating the surface term 
in Equation (20.41). We have p(y) == 1, and we set w(y) = 1, then 


BG I , dG 

surface term = u(b) — — G(x, b)u (b) — u(a) — 

^ y =b d y 


Y2 


dG 


Y\ 


3G_ 


+ G(x, a)u f (a) 
-a 

+ G(x, a)u f (a) — G(x, b)u f (b). 


y=a 


That the unwanted (andunspecified) terms are zero can be seen by observing that G(x^ a) = 
g*(a 7 x) = (g(a, x))*,and that g(x, y) satisfies the BCs adjoint to the homogeneous BCs 
(obtained when y/ =： 0). In this particular and simple case, the BCs happen to be self- 
adjoint (Dirichlet BCs). Thus, u(a) = u(b) = 0 implies that g(a, x) = g(b, ^)=0 for all 
x e [a, b\. (In a more general case the coefficient of u r (a) would be more complicated, but 
still zero.) Thus, we finally have ( 


surface term = y 2 


BG 




Y\ 


BG 


Now, using the expression for G(x, y) obtained in Examples 20.1.5 and 20.3.5, we get 


dG 

Thus, 

3G 

"97 


a — x 
b — a 


0{x -y)-(x- y)S(x - y) 


y=b 


x —a 
b — a 


dG 


dy 


y—a 


x —a 
b — a 


x — a 
b — a 


x-b 
b — a 


— 0(x - y). 



20.4 EIGENFUNCTION EXPANSION OFGREEN'S FUNCTIONS 577 


Substituting in Equation (20.41), we get 


uM =[ 卵論)办 + 

J a b — a b — a 

(Compare this with the result obtained in Example 20.1.5.) W 

Green’s functions have a very simple and enlightening physical interpretation. 
An inhomogeneous DE such as [w] = / (x) can be interpreted as a black box (L^；) 
that determines a physical quantity (m) when there is a source (/) of that physical 
quantity. For instance, electrostatic potential is a physical quantity whose source 
is charge ; 狂 magnetic field has an electric current as its source; displacements and 
velocities have forces as their sources; and so forth. Applying this interpretation 
and assuming that w{x) = 1, we have G(x, y) as the physical quantity, evaluated 
at a ： when its source — y) is located at y. To be more precise, let us say that 
the strength of the source is S\ and it is located at y \; then the source becomes 
S\8{x - yi). The physical quantity, the Green’s function, is then S\G(x,yi), 
because of the linearity of 1 ^: If G(x, y) is a solution of L^[m] = 5(x — y), then 
S\G(x, ^i) is a solution of L^[h] = 5i5(x — j). If there are many sources located 
at { 乃 }^^ with corresponding strengths {5^}^, then the overall source / as a 

function of x becomes f(x) = ^ /=1 Si8(x — yi), and the corresponding physical 
quantity u{x) becomes u{x) = SiG(x, yi). 

Since the source Si is located at y/, it is more natural to define a function S(x) 
and write Si = S(yi). When the number of point sources goes to infinity and yt 
becomes a smooth continuous variable, the sums become integrals, and we have 


f(x) = / S(y)S(x -y)dy, 


u(x) 


S(y)G(x 9 y)dy. 


The first integral shows that 5(^:) = f(x). Thus, the second integral becomes 
u(x) = /j 7 f(y)G(x, y) dy which is precisely what we obtained formally. 


20.4 Eigenfunction Expansion of Green’s Functions 

Green’s functions are inverses of differential operators. Inverses of operators in a 
Hilbert space are best studied in terms of resolvents. This is because if an operator 
A has an inverse, zero is in its resolvent set, and 

R 0 (A) = R 入 (A)| fc0 = (A - 入 I)-” 入 =0 = A" 1 . 

Thus, it is instructive to discuss Green’s functions in the context of the resolvent 
of a differential operator. We will consider only the case where the eigenvalues are 
discrete, for example, when L x is a Sturm-Liouville operator. 

Formally, we have (L — X1)R 入 (L) = 1, which leads to the DE 


(U -k)Rx(x 9 y) 


Hx - y) 


578 20. GREENS FUNCTIONS IN ONE DIMENSION 


where Rx(x,y) = (jc| R 入 (L) |;y>. The DE simply says that Rx(x,y) is the 
Green’s function for the operator — X. So we can rewrite the equation as 
(U - k)Gx(x, y) = 8(x - y)/w(x) where L x — 入 is a DO having some ho¬ 
mogeneous BCs. The GF G k (x,y) exists if and only if (L^ - X)[m] = 0 has no 
nontrivial solution, which is true only if 入 is not an eigenvalue of L^. We choose 
the BCs in such a way that L ^： becomes self-adjoint. 

Let be the eigenvalues of the system LjJm]= 入《， {R*[m] = 0}? =1 , 

and let the Un\x) be the corresponding eigenfunctions. The index ^ distinguishes 
among the linearly independent vectors corresponding to the same eigenvalue 入 
Assuming that L has compact resolvent (e.g.，a Sturm-Liouville operator), these 
eigenfunctions form a complete set for the subspace of the Hilbert space that 
consists of those functions that satisfy the same BCs as the u^\x). In particular, 
Gx(x,y) can be expanded in terms of The expansion coefficients are, of 

course, fimctions of Thus, we can write 

OQ 

Gx(x,y) = 

k n—\ 


where an\y) = /* w{x)ut {k \x)G\{x y y) dx. Using Green’s identity, Equation 
(20.30), and the fact that X n is real, we have 


入 naf)O0 = 



iy(x) [ 入 „wf)0 ： )]*G 入 (x ， y)dx 
w(x)G k (x, ;y){U[Mf)(x)]}* 心 : 
u; ⑷ [wf) (X)]*k [Q(x ， y)] 办 


w(x)uf\x) dx 


w(x) 




u* (k) (y) + X / w(x)ul {k \x)Gx(x,y) dx 
Ja 

< ⑻ (30 +入#(办 


Thus, ajp (y) = u^ k \y) /( 入 n — 入 ）， and the expansion for the Green’s function is 


OO 


G),(x,y) 


EE 

k n=X 


^n (k \y)^ k) (x) 
• 入 n _ 入 


(20.42) 


This expansion is valid as long as k n ♦ \ for any n = 0, 1,2.But this is 

precisely the condition that ensures the existence of an inverse for L — 入 1 . 

An interesting result is obtained from Equation (20.42) if 入 is considered a 
complex variable. In that case, G^(x, y) has (infinitely many) simple poles at 


20.4 EIGENFUNCTION EXPANSION OFGREEM，S FUNCTIONS 579 


eigenfunction 
expansion of GF 


The residue at the pole X n is - Y，k W ⑻ (30wf)(x). If C m is a contour 
having the poles { 人„}= =1 in its interior, then, by the residue theorem, we have 

1 C m 

^lj> G k (x 9 y)dX = -J2J^< (k \y)4 ) 0c)- 

In particular, if we let m —oo, we obtain 

1 ， 

沪 G“x ， y)dk =— 上 

k rt 


2iri 


•OO 


= U* {k \y)u^\x) = - 3 ( 牙)- ， 


(20.43) 


where Coo is any contour that encircles all the eigenvalues, and in the last step 
we used the completeness of the eigenfunctions. Equation (20.43) is the infinite- 
dimensional analogue of Equation (16.12) with /(A) = 1 when the latter equation 
is sandwiched between (x\ and \y). 

20,4.1. Example. Consider the DO L x = d 2 fdx 2 with BCs u(0) = u(a) = 0. n 
an S-L operator with eigenvalues and normalized eigenfunctions 


is 


X n = (^) 2 and 
Equation (20.42) becomes 


u n (x) 


[2 . (nn \ 

h sm (-v x ) 


for « = 1,2,_ 


Gx(x,y) 


which leads to 


a 


oo 

E 


sin(n7ix/a) sm(n7ry/a) 
X — {nn/a)^. 




2ni 


2 ^2 sin(w^x/a) sm(njty/a) 


久一 {nn/d) 1 

1 /2\ ^ . /nn \ . /nn \ f dX 
一巧 W S sm (v) sm 

-( 三 ) f sin (^Resf — ?—,1 

、 a } \ a J ) \-X-{nn/d) 2 \ 

/2\ ^ . (nn \ ■ (nn \ 


The RHS is recognized as —5(x — y). 

If zero is notan eigenvalue of L^ s Equation (20.42) yields 

Un (k \y)u^\x) 

k IU 


G(x, y) = Go(x, y) = ^^ 


入 n 


(20.44) 


which is an expression for the Green’s function of L ^； in terms of its eigenvalues 
and eigenfunctions. 


580 20, GREEW’S FUNCTIONS IW ONE DIMENSION 


20.5 Problems 

20 丄 Using the GF method, solve the DE L v m(x) = du/dx = f{x) subject to 
the BC w(0) = a. Hint: Consider the function v(x) = m(x) — a. 

20‘2_ Solve the problem of Example 20.1.4 subject to the BCs u{a) = u\a) — 0. 
Show that the corresponding GF also satisfies these BCs. 

20.3. Show that the IVP with data {0; 0,0,...»0} has only w = 0 as a solution. 
Hint: Assume otherwise, add u to the solution of the inhomogeneous equation, 
and invoke uniqueness. 

20.4. In this problem, we generalize the concepts of exactness and integrating 

factor to a NOLDE. The DO L^ n) = Pk(x)d k /dx k is said to be exact if 
there exists a DO = YlkZo ak(x)d k /dx k such that 

4 w) [m] = f (|\4 n -”[w]) Vm€ G n [a, bl 

CIJi 

⑻ Show that is exact iff XHo ( 一 1 尸『尸 

(b) Show that there exists an integrating factor for L^ 1 ) 一 that is, a function fi(x) 
such that is exact — if and only if fx{x) satisfies the DE 

n fim 

m=0 

The DO is the formal adjoint of L^ n) . 

20.5* Assuming that L x [u] = 0 has no nontrivial solution, show that the matrix 

/Rl[wi] Ri[m 2 ]\ 

= \n 2 [ui] R 2 [w 2 ] 厂 

where u\ and U 2 are independent solutions of L^[m] = 0 and R* are the bound¬ 
ary functionals, has a nonzero determinant. Hint: Assume otherwise and show 
that the system of homogeneous linear equations orRi[“i] + 卢 Ri[M 2 ] = 0 and 
aR 2 [Mi] + )SR 2 [M 2 ] = 0 has a nontrivial solution for {a, P), Reach a contradiction 
by considering u = au\ + ^U 2 as a solution of L x [u] = 0. 

20.6. Determine the formal adjoint of each of the operators in (a) through (d) 
below (i) as a differential operator, and (ii) as an operator, that is, including the 
BCs. Which operators are formally self-adjoint? Which operators are self-adjoint? 
⑻ L x = d 2 /dx 2 + 1 in [0,1] with BCs m( 0) — w(l) = 0. 

(b) = d 2 /dx 2 in [0,1] with BCs u(0) = u f (0) = 0. 

(c) Ljc = d/dx in [0, oo] with BCs m(0) = 0. 

⑹ h = d 3 /dx 3 - smxd/dx + 3 in [0, n] with BCs “⑼ = “’⑼ = 0, m " ⑼一 
4u(n) = 0 . 


20.5 PROBLEMS 581 


20.7, Show that the Dirichlet, Neumann, general unmixed, and periodic BCs make 
the following formally self-adjoint SOLDO self-adjoint: 


U = 


w dx 



20.8. Using a procedure similar to that described in the text for SOLDOs, show 
that for the FOLDO L ^； = p\d/dx H- 

(a) the indefinite GF is 


0(x 9 y) 


从 00 


Pi ( 加 00 


0(x-y) 
从 (x) 


+ C(j), 


where fi(x) = exp 


：f 


Po(t) 

Pi(t) 


dt 


(b) and the GF itself is discontinuous atx = y with 


(c) For the homogeneous BC 


R[m] = a\u(a) H- 戊 2 »'⑷ H- Piu(b) + 戶 2 U’(b) = Q 


construct G(x, y) and show that 


0(x t y)= 


1 

p\(y)w(y)v(y) 


v(x)0(x - y) + C(y)v(x), 


where u(jc) is any solution to the homogeneous DE 1 ^[m] = 0 and 

Mb) + ^v\b) 


C(y )= 


with R[u] ^ 0 


⑹ Show directly that L X [G ] : - y)/w(x). 


20.9. Let L x be a NOLDO with constant coefficients. Show that if u(x) satisfies 
^[w] = f(x), then u{x — y) satisfies L^[m] = f(x — j). (Note that no BCs are 
specified.) 

20.10, Find the GF for L x = d 2 ldx 2 H- 1 with BCs 卩⑼ =m’( 0) = 0, Show that 
it can be written as a function of x — y only. 

20.11« Find the GF for L x — d 2 /dx 2 + k 2 with BCs m(0) = u{a) = 0. 

20.12. Find the GF for L x = d 2 /dx 2 — k 2 with BCs m(oo) = m(— oo) = 0. 

20.13. Find the GF for Ljp = (d/dx){xd/dx) given the condition that G(x, y) is 
finite at^ = 0 and vanishes atx = 1. 



582 20. GREENS FUNCTIONS IN ONE DIMENSION 


20.14. Evaluate the GF and the solutions for each of the following DEs in the 

interval [0,1], 

(a) u n — k 2 u = /; m ⑼一 m’(0 ) = a 9 w(l) = b. 

(b) m" = /; m(0) = m’(0 ) = 0. 

(c) u n + 6u f + 9u = 0 ;« ⑼ = 0, m’ ⑼ =1_ 

(d) u f, + co 2 u ― fix), for jc > 0 ; u(0) = a, u f (0) = b. 

(e) m( 4 ) = /; m ⑼ = 0 , m’(0 ) = 2 m’(1 ) ， w(1 ) = a, m" ⑼ = 0. 

20.15* Use eigenfunction expansion of the GF to solve the BVP u f, = x,m(0) = 0, 

m(1 ) - 2 m'(1 ) = 0. 

Additional Reading 

1. Dennery, P. and Krzy wicki, A. Mathematics for Physicists, Harper and Row, 
1967. Contains an exposition of Green’s functions for second-order linear 
differential equations. 

2. Roach, G. Green’s Functions ， Van Nostrand, 1970. A readable introduce 
tion to Green’s functions, especially those we have called “indefinite” (no 
boundary conditions specified). 



21 


Multidimensional Green’s Functions: 
Formalism 


The extensive study of Green’s functions in one dimension in the last chapter has 
no doubt exhibited the power and elegance of their use in solving inhomogeneous 
differential equations. If the differential equation has a (unique) solution, the GF 
exists and contains all the information necessary to build it up. The solution results 
from operating on the inhomogeneous term with an integral operator whose kernel 
is the appropriate Green’s function. 

The Green’s function’s very existence depends on the type ofBCs imposed. We 
encountered two types of problems in solving ODEs. The first, called initial value 
problems (IVPs), involves fixing (for an rath-order DE) the value of the solution 
and its first n — \ derivatives at a fixed point. Then the ODE, if it is sufficiently 
well-behaved, will determine the values of the solution in the neighborhood of the 
fixed point in a unique way. Because of this uniqueness, Green’s functions always 
exist for IVPs. 

The second type of problems, called boundary value problems (BVPs), con¬ 
sists — when the DE is second order~of determining a relation between the solu¬ 
tion and its derivative evaluated at the boundaries of some interval [a, b]. These 
boundary values are relations that we denoted by R,[m] = y/, where i = 1,2. In 
this case, the existence and uniqueness of the Green’s function are not guaranteed. 

There is a fundamental (topological) difference between a boundary in one di¬ 
mension and a boundary in two and more dimensions. In one dimension a boundary 
consists of only two points; in 2 and higher dimensions a boundary has infinitely 
many points. The boundary of a region in M 2 is a closed curve, in M 3 it is a closed 
surface, and in R m it is called a hypersurface. This fundamental difference makes 
the study of Green’s functions in higher dimensions more complicated, but also 
richer and more interesting. 



584 21. MULTIDIMENSIONAL GREENS FUNCTIONS: FORMALISM 


21.1 Properties of Partial Differential Equations 

This section presents certain facts and properties of PDEs, in particular, how BCs 
affect their solutions. We shall discover the important difference between ODEs 
and PDEs: 


21.1.1. Box. The existence of a solution to a PDE satisfying a given BC 
depends on the type of the PDE, 


We shall be concerned exclusively with a linear PDE. A linear PDE of order 
M mm variables is of the form 


L x [m] = /(x) where L x = 2^ ( 21 』) 

|/|=l J dx 

where the following notation has been used: 

J = C/l ， • _ • ， jm )， 

s\ J \ a ⑺ 

9x 7 _ ajcfW 2 … “， 


X = (Xl ， " ■ ， 

l«^l — h + h - 卜 jm ， 


principal part of a 
PDE 


the jk are nonnegative integers; M, the order of the highest derivative, is called the 
order of the PDE. The outer sum in Equation (21.1) is over |7|; once |/| is fixed, 
the inner summation goes over individual jk’s with the restriction that their sum 
has to equal the given |/|. 

The principal part of L x is 

a M 

Lp = 〉: cij{x \,. . - , Xyn) j . (21.2) 

\J\=M dX 


The coefficients aj and the inhomogeneous (or source) term / are assumed to be 
continuous functions of their arguments. 

We consider Equation (21.1) as an IVP with appropriate initial data. The most 
direct generalization of the IVP of ordinary differential equation theory is to specify 
the values of u and all its normal derivatives of order less than or equal to A/ — 1 
on a hypersurface F of dimension m — 1. This type of initial data is called Cauchy 
Cauchy data and data, and the resulting IVP is known as the Cauchy problem for L x . The reason 
Cauchy problem that the tangential derivatives do not come into play here is that once we know the 
values of u on F, we can evaluate u on two neighboring points on P, take the limit 
as the points get closer and closer, and evaluate the tangential derivatives. 



21.1 PROPERTIES OF PARTIAL DIFFERENTIAL EQUATIONS 585 


21.1.1 Characteristic Hypersurfaces 

In contrast to the IVPin one dimension, the Cauchy problem for arbitrary Cauchy 
data may not have a solution, or if it does, the solution may not be unique. 


21.1.2. Box. The existence and uniqueness of the solution of the Cauchy 
problem depend crucially on the hypersuiface T and on the type ofPDE. 


We assume that T can be parametrized by a set of m functions of/n — 1 parameters. 
These parameters can be thought of as generalized coordinates of points of r. 

Consider a point P on T. Introduce m — l coordinates § 2 , ■. ■ ， 与 m, called 
tangential tangential coordinates, to label points on r. Choose, by translation if necessary, 
coordinates coordinates in such a way that P is the origin, with coordinates (0, 0, …， 0). Now 
let v = stand for the remaining coordinate normal tor. Usually is taken to be 
the ith coordinate of the projection of the point on r onto the hyperplane tangent 
to T at P. 

As long as we do not move too far away from P, the Cauchy data on T can be 
written as 


du d M ^u 

M (0, 专2， .. • ， 专 m)t 专2， • • ■ ， 吾 / k )，•. • ，谷 ( O ’ 每2, * . ^ n )* 

Using the chain rule, Su/Bxi = where & = v，we can 

also determine the first M —1 derivatives of u with respect to x/. The fundamental 
question is whether we can determine u uniquely using the above Cauchy data 
and the DE. To motivate the answer, let’s look at the analogous problem in one 
dimension. 

Consider the Afth-order linear ODE 


d M u 


+ fli 


dx 


+ a 0 (x)u = f{x) 


(21.3) 


with the following initial data at :co: {m(xo), u\xo) . 汉 ⑽― 1 )^。）}. If the co¬ 
efficients the inhomogeneous term f(x) are continuous and if 

ajw (JCo) 一 0, then Theorem 20.2.3 implies that there exists a unique solution to 
the I VP in a neighborhood of xq. 

For aj^(xo) ^ 0, Equation (21.3), the initial data, and a knowledge of /(xo) 
give uniquely. Having found u^ M \xo), we can calculate, with arbitrary 

accuracy (by choosing Ax small enough), the following set of new initial data at 

义 1 = 耶 + ^x: 


u{x\) — m(^o) + m'Cjco) △太 ，…， u^ M ~ x \x\) = m (M_1) (xq) 4 - u^ M \xo)Ax. 


Using these new initial data and Theorem 20.2.3, we are assured of a unique 
solution at xi. Since is assumed to be continuous foxx\, for sufficiently small 





586 21. MULTIDIMENSIONAL GREEN'S FUNCTIONS: FORMALISM 


Ajc, aM ( 文 o) is nonzero, and it is possible to find newer initial data at xi = x\-\-Ax. 
The process can continue until we reach a singularity of the DE, a point where 
gmM vanishes. We can thus construct the unique solution of the IVPinaii interval 
(xo, b) as long as aM(x) does not vanish anywhere in [xq^ b]. This procedure is 
analogous to the one used in the analytic continuation of a complex function. 

For aM(xo) = 0, however, we cannot calculate m( m )0 ： o) unambiguously. In 
such a case the LHS of (21.3) is completely determined from the initial data. If the 
LHS happens to be equal to f(xo), then the equation is satisfied for any u {M) (xo) 9 
i.e” there exist infinitely many solutions for w( M) (xo); if the LHS is not equal to 
f(xo), there are no solutions. The difficulty can be stated in another way, which 
is useful for generalization to the w-dimensional case: 


21,1.3. Box. If aM(xo) = 0 in (21.3), then the initial data determine the 
function L x [u]. 


Let us now return to the question of constructing u and investigate conditions 
under which the Cauchy problem may have a solution. We follow the same steps 
as for the IVP for ODEs. To construct the solution numerically for points near 
P but away from T (since the function is completely determined on T, not only 
its Mth derivative but derivatives of all orders are known on F), we must be able 
to calculate d M u/dv M at P. This is not possible if the coefficient of 3 M u/Bv M 
in L x [m] is zero when x \,... ,x w is written in terms of v ， fe，"• ， When this 
happens, L^[m] itself will be determined by the Cauchy data. This motivates the 
following definition. 

21.1.4. Definition. IfL x [u] can be evaluated at a point P on V from the Cauchy 
characteristic data alone, then T is said to be characteristic for Ljj ； at P. IfT is characteristic for 
hypersurface a u it s points, then it is called a characteristic hypersutface for L x . The Cauchy 
problem does not have a solution at a point on the characteristic hypersurface. 

The following theorem characterizes T: 

21.1*5. Theorem. Let T be a smooth (m — 1)-dimensional hyper surf ace. Let 
L x [m] = f be an Mth-order linear PDE in m variables. Then T is character¬ 
istic at P e T if and only if the coefficient of S M u/dv M vanishes when L x is 
expressed in terms of the normal-tangential coordinate system (v ， 专 2 , • -, ， §w)- 

One can rephrase the foregoing theorem as follows: 


21.1.6. Box. The hypersurface T is not characteristic at P if and only if all 
Mth-order partial derivatives ofu with respect to are unambiguously 

determined at P by the DE and the Cauchy data on T. 




21.1 PROPERTIES OF PARTIAL DIFFERENTIAL EQUATIONS 587 


In the one-dimensional case the difficulty arose when amUo) = 0. In the 
language being used here, we could call xq a “characteristic point ■，’ This makes 
sense because in this special case (m = 1)，the hypersurfaces can only be of 
Characteristic dimension 0. Thus, we can say that in the neighborhood of a characteristic point, the 
,I hyper 3 urfaces M of IVP has no well-defined solution. 1 For the general case (m > 1), we can similarly 
ODEs are points! sa y that the Cauchy problem has no well-defined solution in the neighborhood of 
尸 if 尸 happens to lie on a characteristic hypersurface of the differential operator. 
Thus, it is important to determine the characteristic hypersurfaces of PDEs. 


21.1*7, Example. Let us consider the first-order PDEin two variables 

L x [u] = a(x t y)-^+b{x i y)-^- + F(x, y,u) = 0 (21.4) 

3 a: 

where F(x, y f u) = c{x, y)u + d(x, y). For this discussion the form of F is irrelevant. 

We wish to find the characteristic hypersurfaces (in this case, curves) of L. The Cauchy 
data consist of a simple determination of u on T. By Theorem 21.1.5, we need to derive 
relations that ensure that dujdx and du/dy cannot be unambiguously determined at P = 
(jc, y). Using an obvious notation, the PDE of Equation (21.4) gives 


-F(P, u(P)) - a{P)—(P) + b(P)—(P). 

ax ay 

On the other hand, if g = (x + dx, y + dy) lies on the curve T, then 

du du 

u(Q) - u(P) dx—(P) -\- dy—(P). 


The Cauchy data determine the LHS of both of the preceding equations. Treating these 
equations as a system of two linear equations in two unknowns, Bu/dx(P) and du/dy(P) i 
we conclude that the system has a unique solution if and only if the matrix of coefficients 
is invertible. Thus, by Box 21.1.6, r is a characteristic curve if and only if 


b d ( y p) )=^)dx-a(P)dy = 0 f 

ordy/dx = b(x, y)/a(x, y), assuming that a(x, y) 0. Solving this FODE yields j as a 
function of x, thus determining the characteristic curve. Note that a general solution of this 
FODE involves an arbitrary constant, resulting in a family of characteristic curves. 踴 


Sofia Vasilyevna Kovalevskaya (1850-1891) is considered the greatest woman mathemati¬ 
cian prior to the twentieth century. She grew up in a well-educated family of the Russian 
nobility, her father being an artillery general and reputed to be a descendant of a Hungarian 
king, Mathias Korvin. Sonja was educated by a British governess and enjoyed life at the 
large country estate of her father’s family, although the rather progressive thinking of the 
Kovalevsky sisters did not always meet with approval from their father. 


1 Here lies the crucial difference between ODEs and PDEs: All ODEs have a universal characteristic hypersurface, i.e., a point. 
PDEs, on the other hand, can have a variety of hypersurfaces. 



588 21. MULTIDIMENSIONAL GREENS FUNCTIONS: FORMALISM 


Sonja has written of two factors that attracted her to the study of mathematics. The first 
was her Uncle Pyotr, who had studied the subject on his own and would speak of squaring 
the circle and of the asymptote, as well as of many other things that excited her imagination. 
The second was a curious “wallpaper” that was used to cover one of the children’s rooms at 
Polibino, which turned out to be lecture notes on differential and integral calculus that had 
been purchased by her father in student days. These sheets fascinated her and she would 
spend hours trying to decipher separate phrases and to find the proper ordering of the pages. 

In the autumn of 1867 Sonja went to St. Petersburg, where 
she studied calculus with Alexander Strannolyubsky, a teacher of 
mathematics at the naval school. While there, she consulted the 
prominent Russian mathematician Chebyshev about her math¬ 
ematical studies, but since Russian universities were closed to 
women, there seemed to be no way that she could pursue ad¬ 
vanced studies in her native land. 

In order to escape the oppression of women common in 
Russia at the time, young ladies of ambition and ability would 
often arrange a marriage of convenience in order to allow study 
at a foreign university. At the age of IB, Sonya arranged such 
a marriage with Vladimir Kovalevsky, a paleontologist, and in 1869 the couple moved to 
Heidelberg, where Sonja took courses from Kirchhoff, Helmholtz, and others. Two years later 
she went to Berlin, where she worked with Weierstrass, who tutored her privately, since she, 
as a woman, was not allowed to attend lectures. 

The three papers she published in the next three years earned her a doctorate in absentia 
from the University of Gottingen. Unfortunately, even that distinction was not sufficient to 
gain, her a university position anywhere in Europe, despite strong recommendation from the 
renowned Weierstrass. Her rejections resulted in a six-year period during which time she 
neither undertook research nor replied to Weierstrass^ letters. She was bitter to discover that 
the best job she was offered was teaching arithmetic to elementary classes of schoolgirls, 
and remarked, “I was unfortunately weak in the multiplication table.” 

The existence and uniqueness of solutions to partial differential equations occupied 
the attention of many notable mathematicians of the last century, including Cauchy，who 
transformed the problem into his method of majorant functions. This method was later 
extended and refined by Kovalevskaya to include more general cases. The result was the 
now-famous Cauchy-Kovalevskaya theorem. She also contributed to the advancement of 
the study of Abelian integrals and functions and applied her knowledge of these topics to 
problems in physics, including her paper “On the Rotation of a Solid Body About a Fixed 
Point，’’ for which she won a 5000-franc prize. She also performed some investigations into 
the dynamics of Saturn’s rings, inspiring a sonnet in which she is named “Muse of the 
Heavens.” In 1878, Kovalevskaya gave birth to a daughter，but from 1880 increasingly 
returned to her study of mathematics. In 1882 she began work on the refraction of light, and 
wrote three articles on the topic. In the spring of 1883, Vladimir, from whom Sonja had been 
separated for two years, committed suicide. After the initial shock, Kovalevskaya immersed 
herself in mathematical work in an attempt to rid herself of feelings of guilt. Mittag-Leffler 
managed to overcome opposition to Kovalevskaya in Stockholm, and obtained for her a 
position as privat docent. She began to lecture therein early 1884, was appointed to a five- 
year extraordinary professorship in June of that year, and in June 1889 became the third 
woman ever to hold a chair at a European university. 




21.1 PROPERTIES OF PARTIAL DIFFERENTIAL EQUATIONS 589 


During Kovalevskaya’s years at Stockholm she carried out important research, taught 
courses on the latest topics in analysis, and became an editor of the new journal Ac/a Math- 
ematica. She was the liaison with the mathematicians of Paris and Berlin, and took part in 
the organization of international conferences. Interestingly, Kovalevskaya also nurtured a 
parallel career in literature, penning several novels and a drama, “The Struggle for Happi¬ 
ness^ that was favorably received at the Korsh Theater in Moscow. She died at the pinnacle 
of her scientific career from a combination of influenza and pneumonia less than two years 
after her election to both the Swedish and the Russian Academies of Sciences. The latter 
membership being initiated by Chebyshev, in spite of the Tsarist government’s repeated 
refusal to grant her a university position in her own country. 


21 丄 2 Second-Order PDEs in m Dimensions 

Because of their importance in mathematical physics, the rest of this chapter and 
the next will be devoted to SOPDEs. This subsection classifies SOPDEs and the 
BCs associated with them.. 

The most general linear SOPDE in m variables can be written as 
^ d 2 u ^ du ^ 

， 5/^ (x) ^ + g^ (x) ^ +c(x)M=a 

where Ajk can be assumed to be symmetric in j and k. We restrict ourselves to 
the simpler casein which the matrix (Ajk) is diagonal. We therefore consider the 
PDE 


. S^u / du \ 

uX j 


3x/ 


(21.5) 


where the last term collects all the other terms except the second derivatives. We 
classify SOPDEs as follows: 


second-order PDE of 
elliptic type 


1. Equation (21.5) is said to be of elliptic type at xo if all the coefficients 
aj (xo) are nonzero and have the same sign. 


second-order PDEs 
of hyperbolic and 
ultrahyperbolic type 


2. Equation (21.5) is said to be of ultrahyperbolic type at xq if all 〜 (xo) are 
nonzero but do not have the same sign. If only one of the coefficients has a 
sign different from the rest, the equation is said to be of hyperbolic type. 


second-order PDE of 
parabolic type 


3. Equation (21.5) is said to be of parabolic type at xo if at least one of the 
coefficients aj (xq) is zero. 


If a SOPDE is of a given type at every point of its domain, it is said to be of 
that given type. In particular, if the coefficients aj are constants, the type of the 
PDE does not change from point to point. 



590 21. MULTIDIMENSIONAL GREEN，S FUNCTIONS: FORMALISM 


ill-posed Cauchy 
problem 


21.1.8. Example. In this example, we study the SOPDE in two dimensions. The most 
general linear SOPDE is 


L[u] = a^+2b ^ 
dx" 


d u 


u u i ou au \ 


o, 


( 21 . 6 ) 


where a, b, andc are functions of x and y. 

To determine the characteristic curves of L, we seek conditions under which all second- 
order partial derivatives of u can be determined from the DE and the Cauchy data, which 
are values of u and all its first derivatives on r. Consider a point Q = (x + dx, y + dy) 
close to P = (x f y). We can write 


Su 

^-(2) ^ 
dx 

du /n 、 

- T ~( p )= 
dx 

=dx 

Bu 

- J ■⑻ = 

=dx 


d 2 u 


j2 




(P) + dy 


u 


^xdy 




2 


( 尸 ) ， 
( 尸 ) ， 


/ du Bu \ d^u d^u d^u 

咐 ) ， -(P), -(F)) = aiP)- 2 (F) c(P)- 2 (n 


This system of three linear equations in the three unknowns — the three second derivatives 
evaluated at P — has a unique solution if and only if the determinant of the coefficients is 
nonzero. Thus, by Box 21.1.6, T is a characteristic curve if and only if 


/ dx dy 0 \ 

det (0 dx dy J = 0, 

\a(P) 2b(P) c(p)/ 

or a(x, y)(dy) 2 — 2b(x, y)dxdy -|- c(j:, y)(dx)^ = 0. It then follows，assuming that 
a(x t y) ^ 0, that 

d^ = b±V^- c (217) 

dx a 

There are three cases to consider: 

1 • If — Ac < 0, Equation (21.7) has no solution, which implies that no characteristic 
curves exist at P. Problem 21.1 shows that the SOPDE is of elliptic type. Thus, the 
Laplace equation in two dimensions is elliptic because b 2, — ac = —1. In fact, it 
is elliptic in the whole plane, or, stated differently, it has no characteristic curve in 
the entire x;y-plane_ This may lead us to believe that the Cauchy problem for the 
Laplace equation in two dimensions has a unique solution. However, even though 
the absence of a characteristic hypersurface at P is a necessary condition for the 
existence of a solution to the Cauchy problem, it is not sufficient. Problem 21.4 
presents a Cauchy problem that is ill-posed, meaning that the solution at any fixed 
point is not a continuous function of the initial data. Satisfying this condition is 
required of a well-posed problem on both mathematical and physical grounds. 

2. If b 2 —ac > 0, Equation (21.7) has two solutions; that is, there are two characteristic 
curves passing through P. Problem 21.1 shows that the SOPDE is of hyperbolic type. 
The wave equation is such an equation in the entire M 2 . 


21.1 PROPERTIES OF PARTIAL DIFFERENTIAL EQUATIONS 591 


appropriate BCs are 
determined by the 
type of PDE 


Dirichlet boundary 
condition and 
boundary value 
problem 


Neumann boundary 
condition and 
boundary value 
problem 

boundary conditions 
appropriate for 
elliptic, hyperbolic, 
and parabolic PDEs 


3. If b 2 — ac = 0, Equation (21.7) has only one solution. In this case there is only one 
characteristic curve at P. The SOPDE is parabolic in this case. The one-dimensional 
diffusion equation is an example of an SOPDE that is parabolic in the entire M 2 . M 

The question of what type of BCs to use to obtain a unique solution for a PDE 
is a very intricate mathematical problem. As Problem 21.4 shows, even though it 
has no characteristic curves in the entire R 2 , the two-dimensional Laplace equation 
does not lead to a well-posed Cauchy problem. On the other hand, examples in 
Chapter 19 that dealt with electrostatic potentials and temperatures led us to believe 
that a specification of the solution m on a closed curve in 2D, and a closed surface in 
3D, gives a unique solution. This has a sound physical basis. After all, specifying 
the temperature (or electrostatic potential) on a closed surface should be enough to 
give us information about the temperature (or electrostatic potential) in the region 
close to the curve. A boundary condition in which the value of the solution is 
given on a closed hypersurface is called a Dirichlet boundary condition, and the 
associated problem, a Dirichlet BVP. 

There is another type of BC, which on physical grounds is appropriate for the 
Laplace equation. This condition is based on the fact that if the surface charge on a 
conductor is specified, then the electrostatic potential in the vicinity of the conduc¬ 
tor can be determined uniquely. The surface charge on a conductor is proportional 
to the value of the electric field on the conductor. The electric field, on the pther 
hand, is the normal derivative of the potential. A boundary condition in which 
the value of the normal derivative of the solution is specified on a closed curve 
is called a Neumann BC, and the associated problem, a Neumann boundary 
value problem. Thus, at least on physical grounds, either a Dirichlet BVP or a 
Neumann BVP is a well-posed problem for the Laplace equation. 

For the heat (or diffusion) equation we are given an initial temperature dis¬ 
tribution /(jc) on a bar along, say the x-axis, with end points held at constant 
temperatures. For a bar with end points atx = a and x = b, this is equivalent to 
the data m( 0, x) = f(x), u(t, a) — T\ 9 and u(t 9 b) = 72. These are not Cauchy 
data, so we need not worry about characteristic curves. The boundary curve con¬ 
sists of three parts: (l) t = 0 for a < x < b, (2) £ > 0 for x = a, and (3) ? > 0, 
forx —b. In the -plane, these form an open rectangle consisting of ab as one 
side and vertical lines at a and b as the other two. The problem is to determine u 
on the side that closes the rectangle, that is, on the side a<x<batt>0. 

The wave equation requires specification of both u and du/dt at f = 0. The 
displacement of the boundaries of the waving medium~a taut rope for example — 
must also be specified. Again the curve is open, as for the diffusion case, but the 
initial data are Cauchy. Thus, for the wave equation we do have a Cauchy problem 
with Cauchy data specified on an open curve. Since the curve, the open rectangle, 
is not a characteristic curve of the wave equation, the Cauchy problem is well- 
posed. We can generalize these BCs to m dimensions and make the following 
correspondence between a SOPDE with m variables and the appropriate BCs: 

1. Elliptic SOPDE 分 Dirichlet or Neumann BCs on a closed hypersurface. 


592 21. MULTIDIMENSIONAL GREEN'S FUNCTIONS: FORMALISM 


singular point of a 
transformation 


2. Hyperbolic SOPDE o Cauchy data on an open hypersurface. 

3. Parabolic SOPDE 分 Dirichlet or Neumann BCs on an open hypersurface. 


21.2 Multidimensional GFs and Delta Functions 


This section will discuss some of the characteristics of Green’s functions in higher 
dimensions. These characteristics are related to the formal partial differential op¬ 
erator associated with the Green’s function and also to the delta functions. 

Using the formal idea of several continuous indices, we can turn the operator 
equation LG = 1 into the PDE 

L x G(x,y) = ^ ? ^ > (21.8) 

u;(x) 

where x, y e uj(x) is a weight function that is usually set equal to one, and, 
only in Cartesian coordinates ， 


5 (x-y) =8(xi -yi)8(x2-y2)'-S(x m - y m )= n 咖 1‘). 

i-i (21.9) 

In most applications Cartesian coordinates are not the most convenient to use. 
Therefore, it is helpful to express Equations (21.8) and (21.9) in other coordinate 
systems. In particular, it is helpful to know how the delta function transforms under 
a general coordinate transformation. 

Let 〜 =» 与 m )，i = 1,2,..., m,be a coordinate transformation. Let 
P be a point whose coordinates are a = {a\, • . • ， a m ) and a = (ai, •. • ， a m ) 
in the ^ and § coordinate systems, respectively. Let J be the Jacobian of the 
transformation, that is, the absolute value of the determinant of a matrix whose 
elements are dxt For a function F (x) the definition of the delta function gives 
f d m xF (x)<5 (x — a) = F(a). Expressing this equation in terms of the § coordinate 
system, recalling that d m x — Jd m ^ and at = fi(oc), and introducing the notation 
H(0 = 厂 (/1 ⑹， … ， /m(O)，we obtain 

(21.10) 

This suggests that / Y\T=i (0~fi ( a )) = 11^=1 )iOr,inmore compact 

notation, J8(x — a) = 5( 汔 一 tt). It is, of course, understood that J 卢 0 at 尸 . 
What happens when / = 0 at P? A point at which the Jacobian vanishes is called 
a singular point of the transformation. Thus, all points on the z-axis, including 
the origin, are singular points of Cartesian-spherical transformation. Since J is 
a determinant, its vanishing at a point signals lack of invertibility at that point. 


广 m 



21,2 MULTIDIMENSIONALGFS AND DELTA FUNCTIONS 593 


Ignorable 

coordinates 


Thus, in the transformation from Cartesian to spherical coordinates, all spherical 
coordinates (5, n, (p), with arbitrary <p, are mapped to the Cartesian coordinates 
(0, 0, 一 5). Similarly, the point (0,0,0) in the Cartesian coordinate system goes 
to (0, 9, (p) in the spherical system, with 0 and <p arbitrary. A coordinate whose 
value is not determined at a singular point is called an ignorable coordinate at 
that point. Thus, at the origin both 0 and 少 are ignorable. 

Among the § coordinates, let ig nora t>le at P with Cartesian co¬ 

ordinates a. This means that any function, when expressed in terms of 与％ will 
be independent of the ignorable coordinates. A reexamination of Equation (21.10) 
reveals that (see Problem 21.8) 


1 k 


where Jk = I /跑 +1 … d^ m . 

J ( 21 . 11 ) 


In particular, if the transformation is invertible, k = m and J m = i, and we recover 
JS(x — a) = 8(^ ― a). 


21.2.1. Example. In two dimensions the transformation between Cartesian and polar 
coordinates is given by x\ = x = r cos^ = cos§2, ^2 = ^ — r ^ sin$2 with 
the Jacobian 


J = det 


/dxi/d^i 

W 昨 l 




which vanishes at the origin. The angle 6 is the only ignorable coordinate at the origin. 
Thus, ^ = 2—1 = 1, and 


Jx ^ 广 Jde = f 231 r do = 2nr ^ 5(x) = S(x)8{y) = 

Jo Jo 2?rr 


In three dimensions, the transformation between Cartesian and spherical coordinates 
yields Jacobian J = r 2 sin 沒. This vanishes at the origin regardless of the values of 6 and 
(p. We thus have two ignorable coordinates at the origin (therefore, k ^ 3 — 2 = l), over 
which we integrate to obtain 




d0r 2 sm0 = A-Jtr 2 5(x)= 


S(r) 

47zr^ 



21.2.1 Spherica 里 Coordinates in m Dimensions 

In discussing Green’s functions in m dimensions, a particular curvilinear coor¬ 
dinate system will prove useful. This system is the generalization of spherical 
coordinates in three dimensions. The m-dimensional spherical coordinate system 




594 21. MULTIDIMENSIONALGREEN，S FUNCTIONS: FORMALISM 


element of the 
卬 -dimensional solid 
angle 


is defined as 
^：i = rsin 

X2 = rsmG\ sin 0 m ~2 cos 0 m -i, 

: ( 21 . 12 ) 

jc* = r sin ■ ■ ■ sinO m _k cos 2 <k <m — l 9 


x m = r cos 


(Note that for m = 3, the first two Cartesian coordinates are switched compared 
to their usual definitions.) 

It is not hard to show (see Example 21.2.2) that the Jacobian of the transfor¬ 
mation (21.12) is 

/ = r m - l (smei) m ~ 2 (srn6 2 ) m ~ 3 ■ - - … sin^„ 2 (21.13) 


and that the volume element in terms of these coordinates is 

d m x = Jdr dOi--d$ m -i = r m ~ l dr dQ m , (21.14) 

where 

dQ m = (sin0i) w - 2 (sin0 2 ) wl ^ 3 ••- sinft„_ 2 rfMi9 2 … dO m -i (21.15) 


is the element of the m -dimensional solid angle. 


21,2.2. Example. For m = 4 we have 

= r sin 0\ sin% sin%, X 2 = r sin sin % cos 办， 

= r sin cos X 4 = r cos 0 \, 


and the Jacobian is given by 


/dx\/dr dx\/dO\ 
, bxqjdr dx2/^0\ 
dx 3 /dr dx^/BOi 
Kdx^/dr dx^/dO\ 


dx\/d02 

dx2/d02 

dx^/d02 

8 x 4/862 


dxi/BO^X 
9X2/30^ 
9^3 / dO^ 
dx^/dBy 


= 7*3 sin 2 sin 02 - 


It is readily seen (one can use mathematical induction to prove it rigorously) that the 
Jacobians for m = 2 (J = r), m = 3 (J = r 2 sin 沒 1 )， and m = 4 (/ = r 3 sin 2 沒 1 sin%) 
generalize to Equation (21.13). M 

Using the integral sin n OdO = ^/7tV[(n + 1)/2]/ r[(« + 2)/2], the total 
solid angle in m dimensions can be found to be 


= 


2it m ^ 2 

r(m/2) 


(21.16) 


21.2 MULTIDIMENSIONAL GFS AND DELTA FUNCTIONS 


An interesting result that is readily obtained is an expression of the delta func¬ 
tion in terms of spherical coordinates at the origin. Since r = 0, Equation (21.12) 
shows that all the angles are ignorable. Thus, we have 

Jl= f J dh … dO m -i = r m ~ l f cm m = 


which yields 


^(x) =S(xi)-^8(x m ) 


S(r) _ r(m/2)S(r) 

一 1 ~ 2n m l 2 r m - 1 • 


(21.17) 


21.2.2 Green’s Function for the Laplacian 

With the machinery developed above, we can easily obtain the (indefinite) Green’s 
function for the Laplacian in m dimensions. We will ignore questions of BCs and 
simply develop a function that satisfies V 2 G(x, y) = 8(x — y). Without loss of 
generality we let y = 0; that is, we translate the axes so that y becomes the new 
origin. Then we have V 2 G(x) = 5(x). In spherical coordinates this becomes 


V 2 G(x) 


^mr m ~ 


(21.18) 


Since the RHS is a function of r only, we expect G to behave in the same way. 
We now have to express V 2 in terms of spherical coordinates. In general, this is 

difficult; however, fora function of r = yjx\ 4 - \-x} n alone, such as F(r), we 

have 


dF 一 dF dr — dFxi 
3xr dr dxi dr r 


d 2 F _ d 2 F xf /I 

dxi 2 9r 2 r 2 + dr \r r 3 


so that 


▽ 2 冲卜路 


S 2 F m-lBF 
dr 2 + r 3r 




r m - x dr 


For the Green’s function, therefore, we get 
d (^xdGx S{r) 


(2L19) 


The solution, form > 3, is (see Problem 21.9) 


G(r) = 


r(m/2) / 1 

2(m-2)7r m / 2 \r m - 2 


for m > 3. 


( 21 . 20 ) 





596 21. MULTIDIMENSIONAL GREEN’S FUNCTIONS: FORMALISM 


Green’s function for 
the Lap 冶 cian 


solution of Poisson 
equation in m 
dimensions 


We can restore the vector y, at which we placed the origin, by noting that r 
|r[ — |x — y|. Thus, we get 


G(x,y) 


T(m/2) 


1 


2(m — V ]x — y| m_2 


r(m/2) 


2(m _ 雄 


m 


- y t y 


L/: 


一 ( m - 2)/2 


for m > 3. 


( 21 . 21 ) 


Similarly, for m = 2 we obtain 
1 


G(x, y) 


2tt 


In |x — y| 


ln[ki — y \) 2 + (xi - yi)\ 


An 


( 21 . 22 ) 


Having found the Green’s function for the Laplacian, we can find a solution 
to the inhomogeneous equation, the Poisson equation, V 2 u = —p(x). Thus, for 
m > 3, we get 


M(X) 


d m yG(x,y)p(y) 


r(m/2) 


2{m — 2)7t m / 2 


d m y 


p(y) 


— \r\m—2 * 


x-y 


In particular, for m — 3, we obtain 

M(x) = i/ rf3y P(y) 




which is the electrostatic potential due to a charge density p(y). 


21.3 Formal Development 

The preceding section was devoted to a discussion of the Green’s function for the 
Laplacian with no mention of the BCs. This section will develop a formalism that 
not only works for more general operators, but also incorporates the BCs. 


21.3.1 General Properties 

Basic to a study of GFs is Green’s identity, whose 1-dimensional version we 
encountered in Chapter 20. Here, we generalize it to m dimensions. Suppose there 
exist two differential operators, L x and L^, which for any two functions u and v, 
satisfy the following relation: 


m 

V*L X [U] - u(ll[v]r - v . 

/ =1 


^Qi 

dxi 


[w, v*]. 


(21.23) 


The differential operator L* is 一 as in the one-dimensional case~called the formal 
adjoint of L x . Integrating (21.23) over a closed domain D in R m with boundary 


21.3 FORMAL DEVELOPMENT 597 


3D, and using the divergence theorem, we obtain 



d m x{v*L x [u] - w(l4[u])*} = [ Q - e„ da, 

JdD 


(21.24) 


generalized Green’s where is an m-dimensional unit vector normal to 3D, and da is an element of 
identity “area” of the m-dimensional hypersurface d D. Equation (21.24) is the generalized 

Green’s identity form dimensions. Note that the weight function is set equal to 
one for simplicity. 

The differential operator L x is said to be formally self-adjoint if the RHS of 
Equation (21,24)，the surface term, vanishes. In such a case, we have L x = as 
in one dimension. This relation is a necessary condition for the surface term to 
vanish because u and u are, by assumption, arbitrary. L x is called self-adjoint (or, 
somewhat imprecisely, hermitian) if L x = lJ and the domains of the two operators, 
as determined by the vanishing of the surface term，are identical. 

We can use Equation (21.24) to study the pair of PDEs 


L x [w] = /(x) and Lj[v] = h(x). 


(21.25) 


As in one dimension, we let G(x, y) and g(x, y) denote the Green’s functions for 
L x and L^, respectively. Let us assume that the BCs are such that the surface term 
in Equation (21.24) vanishes. Then we get Green’s identity 


d m xv*L x [u] 


d m xu(ii[v]) : 


(21.26) 


If in this equation we let u = G(x, t) and v = g(x, y), where t,y e D 9 we obtain 


Green's functions are 
symmetric functions 
of their arguments 




d m xg*(x, y)5(x -1) = / d m xG(x ， t)5(x — y), 


D 


or g*(t,y) = G(y, t). In particular, when L x is formally self-adjoint, we have 
y) = G(y, t), or G(t, y) = G(y, t), if all the coefficient functions of L x are 
real. That is, the Green’s function will be symmetric. 

If we let u = g(x, y) and use the first equation of (21.25) in (21.26), we get 


~ Id d m xg*{x, y)/(x), which, using ^*(t,y) = G(y, t) and interchanging 
x and y, becomes m(x) = f D d m yG(x, y)/(y). It can similarly be shown that 
v(x) = f D d m yg(x, y)h(y). 


21.3.2 Fundamental (Singular) Solutions 

The inhomogeneous term of the differential equation to which G(x, y) is a solution 
is the delta function, 5 (x—y). It would be surprising if G(x, y) did not “take notice” 
of this catastrophic source term and did not adapt itself to behave differently at 
x =： y than at any other “ordinary” point. We noted the singular behavior of the 
Green’s function at jc ~ y in one dimension when we proved Theorem 20.3.4. 


MULTIDIMENSIONAL GREENS FUNCTIONS: FORMALISM 


There we introduced h(x t y ) — which was discontinuous 3tx = y — as a part of 
the Green's function. Similarly, when we discussed the Green’s functions for the 
Laplacian in two and m dimensions earlier in this chapter, we noted that they 
behaved singularly at r = 0 or x = y. In this section, we study similar properties 
of the GFs for other differential operators. 

Next to the Laplacian in difficulty is the formally self-adjoint elliptic PDO 
L x = V 2 + q(is) discussed in Problem 21.10. Substituting this operator in the 
generalized Green’s identity and using the expression for Q given in Problem 
21.10, we obtain 


J D 


d m x{vL x [u] — m(L x [i;])} = f (ve n * Vw — ue n - Vu) da. 

JdD 


Letting v = G(x, y) and denoting e« • V by d/dri gives 



d m x[GL x u — mL x G]= 



G 


3u 

dn 


u 


9G- 
3n - 


da. 


(21.27) 


We want to use this equation to find out about the behavior of G(x, y) as |x—y[ —> 0. 
Therefore, assuming that y € Z), we divide the domain D into two parts: one part 
is a region D € bounded by an infinitesimal hypersphere with radius 6 and center 
at y; the other is the rest of D. Instead of D we use the region D f = D — D € . The 
following facts are easily deduced for D r : (1) L x G(x, y) = 0 because 
(2) J D = lim^o f Dr ； (3) BD f = 9D U 

Suppose that we are interested in finding a solution to 


L x [«] = [V 2 + ^(x)]m(x) = /(X) 

subject to certain, as yet unspecified, BCs. Using the three facts listed above, 
Equation (21.27) yields 



d m x[GL x u — uL x G ]= 



d m x\G \-\U — u Lx^7] 


=/ =o 


=lim / d m xG(x ， y)f00= f d m xG(x ， y)f(x) 

Jd 



We assume that the BCs are such that the integral over BD vanishes. This is 
a generalization of the one-dimensional case (recall from Chapter 20 that this 
is a necessary condition for the existence of Green’s functions). Moreover, for 
an m-dimensional sphere, da = which for S e reduces to 

Substituting in the preceding equation yields 

f D d'G ㈣ 汹 = 【人 G^-ug) e m ^dQ m . 



21.3 FORMAL DEVELOPMENT 599 


fundamental solution 
is the singular part of 
GF 


homogeneous 
solution is the 
regular part of GF 


regular part of the 
Green’s function 


We would like the RHS to be u(y). This will be the case if 
du 


lim / u-—€ m ~ 1 dQ m = u(y) 

Js € ^ ^ 


lim / G(x, y)—€ m ~ l d^2 m = 0 and 

for arbitrary u. This will happen only if 

lim G(y + r, y)r m ~ l = 0, lim ^(y-i-r, = const. 

r—0 r-^0 dr 

A solution to these two equations is 


(21.28) 


G(x, y) 


作， y) 

2tt 


h(|x — y|) + H(x,y) ifm = 2, 


F(x ， y) 


(m — 2)fi m |x-y 


— vl m— 2 


+ ^(x, y) if w > 3, 


(21.29) 


where H(x, y) and F(x, y) are well behaved at x = y. The introduction of these 
functions is necessary because Equation (21.28) determines the behavior of G(x, y) 
only whenx ^ y. Such behavior does not uniquely determine G (x, y). For instance, 
ln(|x — y|) and ln(|x — y|) behave in the same way as |x — y| 0. 

Equation (21.29) shows that for L x = V 2 + ^(x), the Green’s function consists 
of two parts. The first part determines the singular behavior of the Green’s function 
as x — y • The nature of this singularity (how badly the GF “blows up” as x — y) is 
extremely important, because it is a prerequisite for our ability to write the solution 
in terms of an integral representation with the Green’s function as its kernel. Due 
to their importance in such representations, the first terms on the RHS of Equation 
(21.29) are called the fundamental solution of the differential equation, or the 
singular part of the Green’s function. 

What about the second part of the Green’s function? What role does it play in 
obtaining a solution? So far we have been avoiding consideration of BCs. Here 
H(x, y) can help. We choose H(x, y) in such a way that G(x, y) satisfies the 
appropriate BCs. Let us discuss this in greater detail and generality. 

If BCs are ignored, the Green’s function fora SOPDO L x cannot be determined 
uniquely. In particular, if G(x, y) is a Green’s function, that is, if L x G(x, y)= 
5(x — y), then so is G(x, y) 4 - as long as H(x, y) is a solution of the 

homogeneous equation L x /f(x, y) — 0. Thus, we can break the Green’s function 
into two parts: 


G = G s -\-H, where L x G 5 (x, y) = 5(x-y), L x H(x, y) = 0 

(21.30) 

with G s the singular part of the Green’s function. H is called the regular part of 
the Green’s function. Neither G s nor H (nor G, therefore) is unique. However, the 
appropriate BCs, which depend on the type of L x , will determine G uniquely. 

To be more specific, let us assume that we want to find a Green’s function for 
L x that vanishes at the boundary dD. That is, we wish to find G(x, y) such that 


21. MULTIDIMENSIONAL GREEW’S FUNCTIONS: FORMALISM 


G(Xb, y) = 0, where Xb is an arbitrary point of the boundary. All that is required 
is to find a G s and an H satisfying Equation (21.30) with the BC H(xb^y ) — 
-Gj(x^,y). The latter problem, involving a homogeneous differential equation, 
can be handled by the methods of Chapters 18 and 19. Since any discussion of 
BCs is tied to the type of PDE, we have reserved the discussion of such specifics 
for the next chapter. 


21.4 Integral Equations and GFs 

Integral equations are best applied in combination with Green’s functions. In fact ， 
we can use a Green’s function to turn a DE into an integral equation. If this integral 
equation is compact or has a compact resolvent, then the problem lends itself to 
the methods described in Chapters 16 and 17. 

Let L x be a SOPDO in m variables. We are interested in solving the SOPDE 
L x [u] + ^V(x)m(x) = /(x) subject to some BCs. Here X is an arbitrary constant, 
and V (x) is a well-behaved function on M m . Transferring the second term on the 
LHS to the RHS and then treating the RHS as an inhomogeneous term, we can 
write the “solution” to the PDE as 

w(x) = H(x) + f d m yG 0 (x ， y)[/(y) - ^V(y)«(y)3, 

Jd 

where D is the domain of L x and Go is the Green’s function for L x with some, as 
yet unspecified, BCs. The function H ha. solution to the homogeneous equation, 
and it is present to guarantee the appropriate BCs. 

Combining the first term in the integral with H(x), we have 

M ⑻: = F(x)-k f d m yG 0 (x, y)V(y)u(y). (21.31) 

Jd 

Equation (21.31) is an m-dimensional Fredholm equation whose solution can be 
obtained in the form of a Neumann series. 



21.4 INTEGRAL EQUATIONS AND GFS 601 


where i s solution of L x [^o] = 0, which is easily found to be of the general form 
屯 oCO = + Be~ KX . If we assume that 屯 oOO remains finite as x ±oo, ^o(x) 

will be zero. Furthermore，it can be shown that Gq(a:, y) =— 它一忆 1 文 - ^1 /2k (see Problem 


20.12). Therefore, 


■oo 




H^k 


e^"yW(y)^(y)dy. 


■OO 


Now consider an attractive delta-function potential with center at a: V (x) = — Vq8(x — 
^)» > 0- For such a potential, the integral equation yields 

^(x) = -^- /°° e-^ x -yW 0 S(y-a)^(y)dy^ ^.e^ x ~ a ^(a). 
foK J—oO Jl^K 

For this equation to be consistent, i.e., to get an identity when jc = a, we must have 



4 



^V 0 


Therefore, there is only one bound state and one energy level for an attractive delta-function 
potential. M 


To find a Neumann-series solution we can substitute the expression for m given 
by the RHS of Equation (21.31) in the integral of that equation. The resulting 
equation will have two integrals, in the second of which u appears. Substituting 
the new u in the second integral and continuing the process N times yields 


N- 


M (x) = F(x) + / d m yK n (x, y)F(y) 

+ (-X) N j^d m yK N (x^My), 


where 


y) = V(x)G 0 (x,y), 

K n (x, y)= f d m tK n -\x, t)K(t, y) for n>2. 

Jd (2132) 

The Neumann series is obtained by letting N oo: 

OO n 

“ ⑻ =F(x) + 及 - 叫。 d m yK n (x, y)F(y). (21.33) 


Except for the fact that here the integrations are in m variables, Equation (21.33) is 
the same as the Neumann series derived in Section 17.1, In exact analogy, therefore, 
we abbreviate (21.33) as 


OO 

w = 附 E (- x ) • 


(21.34) 




21.5 PERTURBATION THEORY 603 





Figure 21.1 Contributions to the full propagator in (a) the zeroth order, (b) the first order, 
and (c) the second order. At each vertex one introduces a factor of —XV and integrates over 
all values of the variable of that vertex. 


where K n (x 9 z) is as given in Equation (21.32), 

Feynman’s idea is to consider G(x, y) as an interacting propagator between 
points x and y and Gq(x, y) as a free propagator. The first term on the RHS of 
(21.39) is simply a free propagation from x toy. Diagrammatically, it is represented 
by a line joining the points x and y [see Figure 21.1(a)], The second term is a free 
propagation from x to yi (also called a vertex), interaction at yj with a potential 
—入 V(yi), and subsequent free propagation to y [see Figure 21.1(b)]. According to 
the third term, the particle or wave [represented by Uf (x)] propagates freely from 
x to yi, interacts at yi with the potential —kV (yi), propagates freely from yi to 
y 2 » interacts for a second time with the potential —入 V(y 2 )，and finally propagates 
freely from y 2 to y [Figure 21.1(c)]. The interpretation of the rest of the series 
in (21.39) is now clear: The wth-order term of the series has n vertices between 
x and y with a factor —kV(yk) and an integration over at vertex k. Between 
any two consecutive vertices yk and yt+i there is a factor of the free propagator 

Feynman diagrams are used extensively in relativistic quantum field theory, for 
which m = 4, corresponding to the four-dimensional space-time. In this context 
入 is determined by the strength of the interaction. For quantum electrodynamics, 
for instance, k is the fine-structure constant, e 2 /hc = 1/137. 


21.5 Perturbation Theory 

Few operator equations lend themselves to an exact solution, and due to the urgency 
of finding a solution to such equations in fundamental physics, various techniques 
have been developed to approximate solutions to operator equations. We have 
already seen instances of such techniques in, for example, the WKB method. This 
section is devoted to a systematic development of perturbation theory, which is one 
of the main tools of calculation in quantum mechanics. For a thorough treatment 
of perturbation theory along the lines presented here, see [Mess 66, pp. 712-720]. 


604 21. MULTIDIMENSIONAL GREEN’S FUNCTIONS: FORMALISM 


perturbing potential 


The starting point is the resolvent (Definition 16.8.1) of a Hamiltonian H, 
which, using z instead of X, we write as R Z (H). For simplicity, we assume that 
the eigenvalues of H are discrete. This is a valid assumption if the Hamiltonian 
is compact or if we are interested in approximations close to one of the discrete 
eigenvalues. Denoting the eigenvalues of H by }= 0 , we have 


HP/ = EiPi, (21.40) 

where P/ is the projection operator to the jth eigenspace. We can write the resolvent 
in terms of the projection operators by using Equation (16.8): 


OO 


/ =o 


Pi 

E t -z 


(21.41) 


The projection operator P/ can be written as a contour integral as in Equation 
(16.13). Any sum of these operators can also be written as a contour integral. For 
instance, if T is a circle enclosing the first n + 1 eigenvalues, then 


p r = R Z (H) 办 . (21.42) 

to 2niJ ^ 

Multiplying Equation (21.42) by H and using the definition of the resolvent, one 
can show that 

HP r = S zR z (H)dz. (21.43) 

2jt/ /f 

When T includes all eigenvalues of H, Pr = 1, and Equation (21.43) reduces to 
(16.12) with A — T and f(x) x. 

To proceed, let us assume that H = Ho + A,V where Ho is a Hamiltonian with 
known eigenvalues and eigenvectors, and Vis a perturbing potential , 入 is a (small) 
parameter that keeps track of the order of approximation. Let us also use the 
abbreviations 


G(z) = —R Z (H) and Go(z) = —R z (Ho). (21.44) 

Then a procedure very similar to that leading to Equation (21.37) yields 

G(z) = O 0 (z) + 入 G 0 (z)VG(z )， (21.45) 

which can be expanded in a Neumann series by iteration: 

0(z) = ⑵ [VG 0 (z)r. (21.46) 

«=0 

Let {£^} ， {Mq}, and m a denote, respectively, the eigenvalues of Ho, their corre¬ 
sponding eigenspaces, and the latter’s dimensions, 2 In the context of perturbation 


21.5 PERTURBATION THEORY 


Degeneracy is the 
dimension of the 
eigenspace of the 
Hamiltonian. 


theory, m a is called the degeneracy of and is called w fl -foSd degener¬ 
ate, with a similar terminology for the perturbed Hamiltonian. We assume that all 
eigenspaces have finite dimensions. 

It is clear that eigenvalues and eigenspaces of H will tend to those of Ho when 
入 0. So, let us collect all eigenspaces of H that tend to and denote them by 
{M^ Similarly, we use and to denote, respectively, the energy eigenvalue 

and the projector to the eigenspace Since dimension is a discrete quantity, it 
cannot depend on 入 ， and we have 


J a 

dim = dim 


(21.47) 


We also use the notation P for the projector onto the direct sum of ’s. We thus 
have 




(21.48) 


where we have used an obvious notation for the projection operator onto M®. 

The main task of perturbation theory is to find the eigenvalues and eigen¬ 
vectors of the perturbed Hamiltonian in terms of a series in powers of X of the 
corresponding unperturbed quantities. Since the eigenvectors — or, more appro¬ 
priately, the projectors onto eigenspaces — and their corresponding eigenvalues of 
the perturbed Hamiltonian are related via Equation (21.40), this task reduces to 
writing P as a series in powers of k whose coefficients are operators expressible 
in terms of unperturbed quantities. 

For sufficiently small A, there exists a contour in the z-plane enclosing and 
all Ef 9 s but excluding all other eigenvalues of H and Ho. Denote this contour by 
r a and, using Equation (21.42)，write 


P= f r G ^) dz - 


It follows from Equation (21.46) that 


p = p « + S 入 71A ⑻， where A ( 


Oo(z)[yfOo(z)] n dz^ 


(21.49) 


This equation shows that perturbation expansion is reduced to the calculation of 
A ⑻， which is simply the residue of Go(z)[VGo(z)] rt . The only singularity of the 
integrand in Equation (21.49) comes from Gq(z), which, by (21.44) and (21.41), 


2 We use the beginning letters of the Latin alphabet for the unperturbed Hamiltonian. Furthermore, we attach a superscript “O’ 
to emphasize that the object belongs to Hq. 



21. MULTIDIMENSIONAL GREEN'S FUNCTIONS: FORMALISM 


has a pole at Eg. So, to calculate this residue, we simply expand Go(z) in a Laurent 
series about E^: 


Go(z) 


K +y_ 

z ~ hTkn /T^n 




pg 


(£g - El) (^1 Hh 

hh~ ( 功 - . 


Switching the order of the two sums, and noting that our space is the Hilbert space 
of Ho whose basis can be chosen to consist of eigenstates of Ho, we can write Ho 
instead of in the denominator to obtain 


T ， b^a P b 


~P°a 


硿 ) 奸 1 — 釦⑽ - HW+ 1 —( 郃 - Ho)^ 1 (E0- H 。 户 +1 


(硿 一 Ho )^ 


G k 0 +1 (E^)Q° a =Q° a G k 0 +1 (EM 


where we have used the completeness relation for the pg’s, the fact that com¬ 
mutes with H 0 [and, therefore, with Gq +1 ( 五 2)], and, in the last equality, the fact 
that is a projection operator. 3 It follows that 


Go(z) = ~^o+ £(-1)^(^- 

z ~ L a 


= J2(-l) k (z-E 0 a ) k - 1 S k , 

^=o 

where we have introduced the notation 


(21.50) 


P® if ^ = 0, 

-QP a G k 0 (E Q a )Q G a if k>h 


3 Note that although Gq(z) has a pole at E^ t the expressions in the last line of the equation above make sense because 
annihilates all states with eigenvalue E^. The reason for the introduction of on both sides is to ensure that Gq +1 (jE^) will not 
acton an eigenstate of on either side. 


21.5 PERTURBATION THEORY 607 


By substituting Equation (21.50) in Go(z)[VGofe)] w we obtain a Laurent ex¬ 
pansion whose coefficient of (z — 五①― 1 is A( ft ). The reader may check that such 
a procedure yields 

众⑻ =(-l)^ 1 ^s^VS A2 V. • .VS^ +1 , (21.51) 

⑻ 

where by definition, (的 extends over all nonnegative integers such that 

n+l 

=p v /? > o. 

/ =1 

It turns out that for perturbation expansion, not only do we need the expansion of 
P [Equations (21.49) and (21.51)], but also an expansion for HP. Using Equations 
(21.43) and (21.44), with T replaced by F a , we have 

HP = 士表 ZG(Z) dZ = ^ifr + E ^ )Q(z)dz 

= ^lj r 巧 ) G(z)dz + ^°P. 

Substituting for G(z) from Equation (21.46)，we can rewrite this equation as 

00 

(H - E^)P = ⑻， （21.52) 

n=l 

where 

B (n) = (—1) 卜 1 S^VS^V-. • VS、. 1 . (21.53) 

(«-l) 

Equations (21.52) and (21.53) can be used to approximate the eigenvectors 
and eigenvalues of the perturbed Hamiltonian in terms of those of the unperturbed 
Hamiltonian. It is convenient to consider two cases: the nondegenerate case in 
which m a = I, and the degenerate case in which m a > 2. 

21.5.1 The Nondegenerate Case 

In the nondegenerate case, we let |^) denote the original unperturbed eigenstate, 
and use Equation (21.47) to conclude that the perturbed eigenstate is also one¬ 
dimensional. In fact，it follows from (21.40) that P |^) is the desired eigenstate. 
Denoting the latter by \ f) and using Equation (21.49), we have 

00 00 

m=P\°a)=P°a\a) ^ = 0 + E # A ⑻心 

n=l tt=l 


(21.54) 



608 21. MULTIDIMENSIONAL GREEW，S FUNCTIONS: FORMALISM 


first-order correction 
to energy 


second-order 
correction to energy 


because is the projection operator onto |^)- 

More desirable is the energy of the perturbed state E a , which obeys the relation 
HP = E a P. Taking the trace of this relation and noting that tr P = tr = 1， we 
obtain 

E a = tr(HP) =tr(^P + 

\ 71=1 

oo oo 

= E^J^k n trB^ = g + [ 入％， （21.55) 

W—1 =^n 

where we used Equation (21.52). Since 入 is simply a parameter to keep track of 
the order of perturbation, one usually includes it in the definition of the perturbing 
potential V. The nth-order correction to the energy is then written as 

e n =trB (n \ (21.56) 

Since each term of B ⑻ contains at least once, and since 

tr(UP^T) = trCTUP^) 

for any pair of operators U and T (or products thereof), one can cast s n into the 
form of an expectation value of some product of operators in the unperturbed state 
|^). For example, 

£1 = tr 日⑴ =X； (g| P 0 a yP 0 a \ 0 b ) = { 0 a\y\°a) (21.57) 

b 

because i®> = 0 unless b = a. This is the familiar expression for the first order 
correction to the energy in nondegenerate perturbation theory. Similarly， 

e 2 = trB( 2 ) = — tr(P»[— 。河⑻㈣] 

+p2v[-0 減(五 »2 + [-qX( £ » p >p2) 

= ^tVQ2Gg(£^V|2). 

The first and the last terms in parentheses give zero because in the trace sum, 
gives a nonzero contribution only if the state is |^>, which is precisely the state 
annihilated by Q^. Using the completeness relation Ylb 1^) {^1 = 1 = [?) (?l 

for the eigenstates of the unperturbed Hamiltonian, we can rewrite S 2 as 

0 if ^ = a Oif c = a ^ 

b，c ' ^ b^a 

S bc /(EyE° b ) 

This is the familiar expression for the second-order correction to the energy in 
nondegenerate perturbation theory. 



21.5 PERTURBATION THEORY 


21.5.2 The Degenerate Case 

The degenerate case can also start with Equations (21.54) and (21.55). The differ¬ 
ence is that s n cannot be determined as conveniently as the nondegenerate case. 
For example, the expression for 石 i will involve a sum over a basis of because 
|^) is no longer just |2), but some general vector in Instead of pursuing this 

line of approach, we present a more common method, which concentrates on the 
way and the corresponding eigenspaces of the perturbed Hamiltonian, denoted 
by M a , enter in the calculation of eigenvalues and eigenvectors. 

The projector acts as a unit operator when restricted to M^.In particular, it is 

invertible. In the limit of small k, the projection operator P is close to P ^； therefore, 
it too must be invertible, i.e., P : M a is an isomorphism. Similarly, : 

JA a —^ is also an isomorphism~~not necessarily the inverse of the first one. It 

follows that for each vector in there is a unique vector in JA a and vice versa. 
The eigenvalue equation H \ E a ) = E a \E a ) can thus be written as 

HP a \E° a ) = E a P a \E° a ), 

where is the unique vector mapped onto \E a ) by P fl . Multiplying both sides 
by P^, we obtain 

P 0 a HP a \E° a ) = E a P 0 a P a \E° a ) y 

which is completely equivalent to the previous equation because is invertible. 
If we define 

H«= P° a ^aP°a : ^ M2, 1^ = P° a P a P° a : M° a ^ M° a , (21.58) 

the preceding equation becomes 

|^> = E a K a \El ). (21.59) 

As operators on both H a and K a are hermitian. In fact, K^, which can be written 
as the product of P^P a and its hermitian conjugate, is a positive definite operator. 
Equation (21.59) is a generalized eigenvalue equation whose eigenvalues E a are 
solutions of the equation 

det (H fl - xK a ) = 0. (21.60) 

The eigenvectors of this equation, once projected onto by P a , give the desired 
eigenvectors of H. 

The expansions of H a and K a are readily obtained from those of HP a and P a 
as given in Equations (21.49) and (21.52). We give the first few terms of each 
expansion: 

l = p2 - k 2 plvolol(E 0 jo 0 a yp° a + …， 

H a = E% + kP° a MP° a + 入 + • •. • 


( 21 . 61 ) 


610 21. MULTIDIMENSIONAL GREEN，S FUNCTIONS: FORMALISM 


To any given order of approximation, the eigenvalues E a are obtained by terminat¬ 
ing the series in (21.61) at that order, plugging the resulting finite sum in Equation 
(21.60), and solving the determinant equation. 


21.6 Problems 


21.1. Show that the definitions of the three types of SOPDEs discussed in Example 
21.1.8 are equivalent to the definitions based on Equation (21.5). Hint: Diagonalize 
the matrix of coefficients of the SOPDE: 

d 2 u d 2 u t 9 2 m , r / 3u\ _ 

+ V + F h l M ， p w) = °， 

where a, b y and c are functions of jc and 3 ^. Write the eigenvalues as (a + c 士 A)/2 
and consider the three cases |A| < \a + c|, |A| > \a + c|, and |A| = \a + c\. 

21.2. Find the characteristic curves for L^[m] = 3u/dx. 

213. Find the characteristic curves for the two-dimensional wave equation and 
the two-dimensional diffusion equation. 

21A Solve the Cauchy problem for the two-dimensional Laplace equation subject 
to the Cauchy data u(0,y) = 0, (Su/3x)(0, y) = € smky, where e and are 
constants. Show that the solution does not vary continuously as the Cauchy data 
vary. In particular, show that for any e 卢 0 and any preassigned ^ > 0, the solution 
u(x 9 y) can be made arbitrarily large by choosing k large enough. 

21.5. Show that the in Equation (21.12) describe an m-dimensional sphere of 
radius r, that is, x f = r2 * 

21.6. Use JS(x - a) = 8(^ - a) and the coordinate transformation from the 
spherical coordinate system to Cartesian coordinates to express the 3D Cartesian 
delta function in terms of the corresponding spherical delta function at a point 
P == (xo,yo,zo) = (ro, 妒 0 ) where the Jacobian J is nonvanishing. 

21.7. Find the volume of an m-dimensional sphere. 

21*8* Prove Equation (21.11). First, note that the RHS of Equation (21.10) is a 
function of only k of the a’s. This means that 

孖 = H(a\, ..., at). 

(a) Rewrite Equation (21.10) by separating the integral into two parts, one involving 
{^}^ =1 and the other involving {^iYiLk-\-v Compare the RHS with the LHS and 
show tiiat 



J 碎 jt+i … d^ m S(x — a) 


k 

= [ 拖 - a*). 
/ =1 



21.6 PROBLEMS 611 


(b) Show that this equation implies that 5(x —a) is independent of }^ +1 . Thus, 
one can take the delta function out of the integral. 

21.9. Find the m-dimensional Green’s function for the Laplaciaa as follows. 

(a) Solve Equation (21.19) assuming that r # 0 and demanding that G(r) 0 as 
r oo (this can be done only for m > 3). 

(b) Use the divergence theorem in m dimensions and (21.18) to show that 

s 

where is a spherical hypersurface of radius r. Now use this and the result of part 

(a) to find the remaining constant of integration. 

21.10. Consider the operator L x = V 2 + b ■ V + c for which [bi}^ =l and c are 
functions of {xi}f =v 

⑻ Show that Lx[u] = V 2 v — V - (bv) + cv, and 
Q[m, v*] = Q[m, i;] = uVm — uVv 4 - huv. 

(b) Show that a necessary condition for L x to be self-adjoint is 2b’ Vm+m (▽ -b) = 0 
for arbitrary m. 

(c) By choosing some m’s judiciously, show that (b) implies that bi = 0. Conclude 
that L x = V 2 + c(x) is formally self-adjoint. 

21.11. Solve the integral form of the Schrodinger equation for an attractive double 
delta-function potential 

V(x) = -Vo[8(x - a\) -h S(x — 叱)]， Vo > 0. 

Find the eigenfunctions and obtain a transcendental equation for the eigenvalues 
(see Example 21.4.1). 

21.12« Show that the integral equation associated with the damped harmonic os¬ 
cillator DE Jc + 2yx H- coqX = 0, having the BCsx(O) = xq, (dx/dt) t= o = 0, can 
be written in either of the following forms. 


⑷ x(t) =x 0 -^ [1 - e~ 2y{t ~ tf) ]x(t f ) dt r . 

⑻ x(t) = xq coscoot + ^ x ° sin coot -2yf cos[6；oC — dt f . 

coo Jo 

Hint: Take coqX or 2yx, respectively, as the inhomogeneous term. 

21.13. (a) Show that for scattering problems (E > 0) the integral form of the 
Schrodinger equation in one dimension is 


ikx 


ill 


l OO 


e ik \ x ~yW{y)^{y)dy, 


= e 


tx 2 k 


—oo 




612 21. MULTIDIMENSIONAL GREEN，S FUNCTIONS: FORMALISM 



(b) Divide (— 00 , + 00 ) into three regions R\ — (-oo, -a), R 2 = (-a, -\-a) 9 and 
i ?3 = (a, 00 ). Let ifiix) be ^{x) in region R(. Assume that the potential V(x) 
vanishes in R\ and ^ 3 . Show that 

in(x)=e ikx ^ 盖 e_ ikx [ e ik yV(y)xlr 2 (y)d yi 
= e ikx -盖 j" e^ x -yW(y)ir 2 (y)dy, 

♦ 3 (x) = e ikx - 盖 e ikx f e~ iky V(y)f 2 (y)dy. 

This shows that determining the wave function in regions where there is no potential 
requires the wave function in the region where the potential acts. 

(c) Let 


V(x) = 



if |jc| < a, 
if \x\ > a, 



and find ^ 2 (^) by the method of successive approximations. Show that the nth 
term is less than (IfxVoa/h 2 ^ -1 (so the Neumann series will converge) if 
(2Voa/hv) < 1, where v is the velocity and fj,v = hk is the momentum of the 
wave. Therefore, for large velocities, the Neumann series expansion is valid. 


ZL14 •⑻ Show that HR Z (H) = 1 +zR z (H). (b) Use ⑻ to prove Equation (2143) ‘ 


Additional Reading 

1. Folland, G. Introduction to Partial Differential Equations, 2nd ed., Princeton 
University Press, 1995. Discusses multidimensional Green’s functions for 
various differential operators of mathematical physics. 

2. Messiah, A. Quantum Mechanics, volume II, Wiley, 1966. A thorough treat¬ 
ment of perturbation theory in the style of this chapter. 


3. Stakgold, I. Green’s Functions and Boundary Value Problems, Wiley, 1979. 
A detailed analysis of boundary value problems in two and three dimensions. 


22 _ 

Multidimensional Green’s Functions : 
Applications 


The previous chapter gathered together some general properties of the GFs and their 
companion, the Dirac delta function. This chapter considers the Green’s functions 
for elliptic, parabolic, and hyperbolic equations that satisfy the BCs appropriate 
for each type of PDE. 

22.1 Elliptic Equations 

The most general linear PDE in m variables of the elliptic type was discussed in 
Section 21.1.2. We will not discuss this general case, because all elliptic PDOs 
encountered in mathematical physics are of a much simpler nature. In fact, the 
self-adjoint elliptic PDO of the form L x = V 2 + ^(x) is sufficiently general for 
purposes of this discussion. Recall from Section 21.1.2 that the BCs associated 
with an elliptic PDE are of two types, Dirichlet and Neumann. Let us consider 
these separately. 


22.1.1 The Dirichlet Boundary Value Problem 

A Dirichlet B VP consists of an elliptic PDE together with a Dirichlet BC, such as 

L x [w] = V 2 m + q(x)u = /(x) for xe D, 

u(x b ) = g(x b ) for Xb e SD, (22.1) 

where g(x^) is a given function defined on the closed hypersurface dD. 

The Green’s function for the Dirichlet BVP must satisfy the homogeneous BC, 
for the same reason as in the one-dimensional Green’s function. Thus, the Dirichlet 



614 22. MULTIDIMENSIONAL GREEN，S FUNCTIONS: APPLICATIONS 


Green’s function, denoted by Gd(x, y), must satisfy 

L x [Gd(x, y)] = 5(x-y), Gj^b, y) = 0 for € 5. 

As discussed in Section 21.3.2, we can separate Gd into a singular part and 
a regular part H where G^) satisfies the same DE as Gd and H satisfies the 

corresponding homogeneous DE and the BC y) = — (x^, y). 

Using Equation (22.1) and the properties ofGo(x, y) in Equation (21.27), we 
obtain 


w(x) = 乂 d m yG D (^ y)/(y) + ^ g(yb)^^(^yb)da, 


( 22 . 2 ) 


where 9 /Sn y indicates normal differentiation with respect to the second argument. 


Gustav Peter Lejeune Dirichlet (1805-1859), the son of a post¬ 
master, first attended public school, then a private school that em¬ 
phasized Latin. He was precociously interested in mathematics; 
it is said that before the age of twelve he used his pocket money 
to buy mathematical books. In 1817 he entered the gymnasium 
in Bonn. He is reported to have been an unusually attentive and 
well-behaved pupil who was particularly interested in modem 
history as well as in mathematics. 

After two years in Bonn, Dirichlet was sent to a Jesuit col¬ 
lege in Cologne that his parents preferred. Among his teachers 
was the physicist Georg Simon Ohm, who gave him a thorough 
grounding in theoretical physics. Dirichlet completed his Abitur examination at the very 
early age of sixteen. His parents wanted him to study law, but mathematics was already 
his chosen field. At the time the level of pure mathematics in the German universities was 
at a low ebb: Except for the formidable Carl Gauss, in Gottingen, there were no outstand¬ 
ing mathematicians, while in Paris the finnament was studded by such luminaries as P.-S. 
Laplace, Adrien Legendre, Joseph Fourier, and Simeon Poisson. 

Dirichlet arrived in Paris in May 1822. In the summer of 1823 he was fortunate in being 
appointed to a well-paid and pleasant position as tutor to the children of General Maximilien 
Fay, a national hero of the Napoleonic wars and then the liberal leader of the opposition in 
the Chamber of Deputies. Dirichlet was treated as a member of the family and met many of 
the most prominent figures in French intellectual life. Among the mathematicians, he was 
particularly attracted to Fourier, whose ideas had a strong influence upon his later works on 
trigonometric series and mathematical physics. 

General Fay died in November 1825, and the next year Dirichlet decided to return to 
Germany, a plan strongly supported by Alexander von Humboldt, who was working for the 
strengthening of the natural sciences in Germany. Dirichlet was permitted to qualify for 
habilitation as Privatdozent at the University of Breslau; since he did not have the required 
doctorate, this was awarded honoris causa by the University of Cologne. His habilitation 
thesis dealt with polynomials whose prime divisors belong to special arithmetic series. A 
second paper from this period was inspired by Gauss’s announcements on the biquadratic 
law of reciprocity. 







22.1 ELLIPTIC EQUATIONS 615 


Dirichlet was appointed extraordinary professor in Breslau, but the conditions for sci¬ 
entific work were not inspiring. In 1828 he moved to Berlin, again with the assistance of 
Humboldt, to become a teacher of mathematics at the military academy. Shortly afterward, 
at the age of twenty-three, he was appointed extraordinary (later ordinary) professor at the 
University of Berlin. In 1831 he became a member of the Berlin Academy of Sciences, and 
in the same year he married Rebecca Mendelssohn-Bartholdy, sister of Felix Mendelssohn, 
the composer. 

Dirichlet spent twenty-seven years as a professor in Berlin and exerted a strong influence 
on the development of German mathematics through his lectures, through his many pupils, 
and through a series of scientific papers of the highest quality that he published during 
this period. He was an excellent teacher, always expressing himself with great clarity. His 
manner was modest; in his later years he was shy and at times reserved. He seldom spoke 
at meetings and was reluctant to make public appearances. In many ways he was a direct 
contrast to his lifelong friend, the mathematician Karl Gustav Jacobi. 

One of Dirichlefs most important papers, published in 1850, deals with the boundary 
value problem, now known as Dirichlet's boundary value problem, in which one wishes to 
determine a potential function satisfying Laplace’s equation and having prescribed values 
on a given surface, in Dirichlefs case a sphere. 

In 1855, when Gauss died, the University of Gottingen was anxious to seek a successor 
of great distinction, and the choice fell upon Dirichlet. Dirichlet moved to Gottingen in 
the fall of 1855, bought a house with a garden, and seemed to enjoy the quieter life of a 
prominent university in a small city. He had a number of excellent pupils and relished the 
increased leisure for research. His work in this period was centered on general problems of 
mechanics. This new life, however, was not to last long. In the summer of 1858 Dirichlet 
traveled to a meeting in Montreux, Switzerland, to deliver a memorial speech in honor of 
Gauss. WMe there, he suffered a heart attack and was barely able to return to his family 
in Gottingen. During his illness his wife died of a stroke, and Dirichlet himself died the 
following spring. 


Some special cases of (22.2) are worthy of mention. The first is u(\b) = 0, the 
solution to an inhomogeneous DE satisfying the homogeneous BC. We obtain this 
by substituting zero for g(x/,) in (22.2) so that only the integration over £) remains. 
The second special case is when the DE is homogeneous, that is, when /(x) = 0 
but the BC is inhomogeneous. This yields an integration over the boundary SD 
alone. Finally, the solution to the homogeneous DE with the homogeneous BC is 
simply m = 0, referred to as the zero solution. This is consistent with physical 
intuition: If the function is zero on the boundary and there is no source /(x) to 
produce any “disturbance，” we expect no nontrivial solution. 

22« 1.1. Example. Let us find the Green’s function for the three-dimensional Laplacian 
L x = V 2 satisfying the Dirichlet BC y) — 0 for p, on the 巧 -plane. Here D is the 

upper half-space (z > 0) and dD is the xy-p\ane. 

It is more convenient to use r = (x, y, z) and r f = ( jc ; , y\ z f ) instead of x and y, 
respectively. Using (21.21) as we can write 


r f )= 


1 


47r|r — 


+ H(t 9 r f ) 






616 22. MULTID1MENSI0WAL GREENS FUNCTIONS: APPLICATIONS 


method of images 


1 


4 兀 — x f ) 2 + (;y — y f ) 2 + fe - z’) 2 

The requirement that G/> vanish on the 文夕 -plane gives 

■t 1 

/ /(;c,;y ， 0; x f , y\ z f ) 


y t z\x\y r , z f ). 


4 兀 v(5c -x f ) 2 十 (：y — y r ) 2 -\-z a 

This fixes the dependence of // on all variables except z. On the other hand, V 2 /? : = 0 in 
D implies that the form of H must be the same as that of because except at r = 〆 ， 

the latter does satisfy Laplace’s equation. Thus，because of the symmetry of in r and 
r’ [Gi)(r 5 r 7 ) = G/>(r ’， r)] and the evenness of the Laplacian in z (as well as a; and y), we 
have two choices for the z-dependence: (z — z’) 2 and (z + z f ) 2 . The first gives Gq = 0, 
which is a trivial solution. Thus, we must choose 


h (w 乂 " ) = + 

Note that with r ff = (x\ y\ —z f ), this equation satisfies V^H — —5(r — r 〃)， and it may 
appear that H does not satisfy the homogeneous DE, as it should. However, r f/ is outside 
/)， and r _ r" as long as r g D. So // does satisfy the homogeneous DE in D. The Green’s 
function for the given Dirichlet BC is therefore 


G D (ry) = ~ (y^- 

where r" is the reflection of r 7 in the xy-plan^. 

This result has a direct physical interpretation. If determining the solution of the Laplace 

equation is considered a problem in electrostatics, then Gg’(r, r') is simply the potential 
at r of a unit point charge located at 〆 ， and G d (r ， r’）is the potential of two point charges 
of opposite signs, one at r’ and the other at the miiror image of The fact that the two 
charges are equidistant from the xy-plme ensures the vanishing of the potential in that 
plane. The introduction of image charges to ensure the vanishing of G/> at dD is common 
in electrostatics and is known as the method of images. This method reduces the Dirichlet 
problem for the Laplacian to finding appropriate point charges outside D that guarantee 
the vanishing of the potential on dD. For simple geometries, such as the one discussed in 
this example, determination of the magnitudes and locations of such image charges is easy, 
rendering the method extremely useful. 

Having found the Green’s function, we can pose the general Dirichlet BVP: V 2 m = 
—p(r) and u(x, y, 0) = g(;c ， y) for z > 0. The solution is 


,00 疒 00 

w(r) = — / dx f 

J—oo J—oo 


An 


roo , i 

dy/ Jo 


_oo 


l OO 


+ / dx f f dy f 8 ( X y) ^ 
J—oo J—oo 打 


z=0 



(22.3) 


where r = (x 9 y, z) 9 r f = (x\ y f , 〆)，and r f/ — (x\ /, 一 z'). 

A typical application consists in introducing a number of charges in the vicinity of an 
infinite conducting sheet, which is held at a constant potential Vq. If there are N charges, 


22.1 ELLIPTIC EQUATIONS 617 


te)fI=i» locatedat { r i)^p thenp(r) = y) - const ^ Vq, and we 

get 


N 


«(r) = 


<li 


S 4jr Vlr-rH |r-r；| 


+ y 0 r^r,/^ 

J—oo J—00 


Sz 


z=0 


(22.4) 


where r/ = (jc r -, yt , zi) and rj = (a ：/, yi, —Zi). That the double integral in Equation (22.4) 
is unity can be seen by direct integration or by noting that the sum vanishes when z = 0. 
On the other hand, u(x, y, 0) = Vo- Thus, the solution becomes 


w(r) 




rH |r-rj 




國 


22.1«2. Example. The method of images is also applicable when the boundary is a sphere. 
Inside a sphere of radius a with center at the origin, we wish to solve this Dirichlet BVP: 
V 2 h = — p(r, 0, (p) for r < a, and u(a, 6, tp) = The GF satisfies 


▽ 2 G£)(r ， 0 ，史 ; r ’， 〆 ， 〆 ） = 5(r —〆 ） for r < a, 
G D (a,0 i (p; r\ 0\ (p f ) = 0. 


(22.5) 


Thus, G d can again be interpreted as the potential of point charges, of which one is in the 
sphere and the others are outside. 

We write Gd = + H and choose H in such a way that the second equation in 

(22.5) is satisfied. As in the case of the ^:v-plane, let 1 H(r, r") = —-~~~~ ^:， where k is 

4;rjr —r"| 

a constant to be determined. If r" is outside the sphere, will vanish everywhere inside 

the sphere. The problem has been reduced to finding k and r 〃 (the location of the image 
charge). We want to choose r 〃 such that 


1 


k 


|r —|r-^| 


fc(|r - r f \)r^a - (|r - r f/ \) r = a 


This shows that 灸 must be positive. Squaring both sides and expanding the result yields 

k 2 (a 2 -\-r ,2 — 2ar f cos y) = a 2 十 r" 2 — 2ar rf cos y, 

where y is the angle between r and r^, and we have assumed that r 7 and r" are in the 
same direction. If this equation is to hold for arbitrary y, we must have k^r f = r n and 
/: 2 (a 2 +〆 2 ) = a 2 +r ,/2 . Combining these two equations yields 一 灸 2 (a 2 +r 。 )+a 2 = 

0, whose positive solutions are 灸 =1 and k = a/r. The first choice implies that r"= 〆 ， 
which is impossible because r 〃 must be outside the sphere. We thus choose k — ajr\ 
which gives r 〃 = (a 2 /r f2 )r f . We then have 


GD(r ， 〆) 


4jt 


ar 


Jr — r'| |r /2 r — a 2 r ’|」 


( 22 . 6 ) 


1 Actually, to be general, we must add an arbitrary function / (r〃）to this. However, as the reader can easily verify, the following 
argument will show that f(r ,f ) = 0. Besides, we are only interested in a solution, not the most general one. All simplifying 
assumptions that follow are made for the same reason. 


MULTIDIMENSIONAL GREEW’S FUNCTIONS: APPLICATIONS 


Substituting this in Equation (22.2)，and noting that dG/dn y = (dG/dr f 、 r f =a ，yields 

•2tt 

1 I ” • J ” _i I 

«(r) 


a r a dr f f K sm0 f dO f ^ 


ar' 


4?r Jo jq 


〆! |r , 2 r-aV| 




(22.7) 


where a = (a, 6 f , <p r ) is a vector from the origin to a point on the sphere. For the Laplace 
equation p(r r ) = 0, and only the double integral in Equation (22.7) will contribute. 

It can be shown that if g(0 f ,<p f ) = const = Vq, then w(r) = Vo- This is the familiar 
fact shown in electromagnetism: If the potential on a sphere is kept constant, the potential 
inside the sphere will be constant and equal to the potential at the surface. 圈 

22.1.3. Example. In this example we find the DirichletGFfora circle of radius a centered 
at the origin. The GF is logarithmic [see Equation (21.22)]. Therefore,//is also logarithmic, 
and its most general form is 


H(r, r n ) 




_ ln(lr .^_„ ln[/(0 ] 


2tt 




so that 


Gj}(r, r r ) = ln(|r — r^) — — ln(|r — r"|/(r〃))= 
For Gj) to vanish at all points on the circle, we must have 


2n 


In 


r-r' 

(r-0/(r〃) 


a 


(a — rW) 


今 |a-r / | = |(r-r")/(r / ’)|， 


where si is a vector from origin to a point on the circle. Assuming that r" and r f are in the 
same direction, squaring both sides of the last equation and expanding the result, we obtain 
(a 2 + r tf2 - 2ar n cos y) / 2 (r") = a 2 + r 72 - 2ar f cosy, where y is the angle between a 
and r’（or r〃). This equation must hold for arbitrary y. Hence, we have / 2 (r")r 〃 = 〆 and 
/ 2 (r 〃 )(a 2 + r //2 > — a 2 + r a . These can be solved for /(r〃）and r". The result is 


金，汴") 


a 


a 


Substituting these formulas in the expression for G/j, we obtain 


Gi>(r ， 〆) = 去 ln(|r _ 〆[) — 士 In ( 




a 


To write the solution to the Dirichlet BVP, we also need dGj)/8 n 
polar coordinates, we express Gp as 


3Gi>/3r’. Using 


GD ^ r，) = 4^ hl 


r 2 r f2 _ 2 rr f cos(d — 0 f ) 


r 2 r 々 /a 2 + a 2 — 2rr f cos(0 — 0 f ) 


Differentiation with respect to r f yields 


BG d 


dn 


SG D 


a 2 -r 2 


r f =a — dr f r'=a — 2ira r 2 + a 2 - 2ra cos(0 — 的 ’ 



22.1 ELLIPTIC EQUATIONS 619 


Poisson integral 
formula 


from which we can immediately write the solution to the two-dimensional Dirichlet BVP 
= p, u(r = a) = g{0 r ) as 


% 2n 




w(r) 


i 2 - r 2 f 271 


8iO l ) 


=1 ^ Jo r/ ^^ r/)P(rVr ， + ^"io 气 2 +a 2_ 2rfleos( K 


In particular, for Laplace’s equation p(r f ) = 0, and we get 
• 2 -- 2 ^ g { e f ) 


u{r, 6) 


cv 


d0 r 


2ira Jq r 2 + a 2 — 2ra cos(0 — 0 f ) 
Equation (22.8) is called the Poisson integral formula. 


( 22 . 8 ) 

■ 


22.1.2 The Neumann Boundary Value Problem 

The Neumann BVP is not as simple as the Dirichlet BVP because it requires 
the normal derivative of the solution. But the normal derivative is related to the 
Laplacian through the divergence theorem. Thus, the BC and the DE are tied 
together, and unless we impose some solvability conditions, we may have no 
solution at all. These points are illustrated clearly if we consider the Laplacian 
operator. 

Consider the Neumann BVP 

o ' du 

V 2 m = /(x) for x € D, and — = g(x) for x G d£>. 

an 

Integrating the first equation over D and using the divergence theorem, we obtain 

f f(x)d m x = f V - (Wu)d m x = f e n -Wuda = f ^-da, 

JD JD JdD JdD 

It follows that we cannot arbitrarily assign values of 3u/dn on the boundary. In 
particular, if the BC is homogeneous, as in the case of Green’s functions, the RHS 
is zero, and we must have f D f(x) d m x = 0. This relation is a restriction on the 
DE, and is a solvability condition, as mentioned above. To satisfy this condition, 
it is necessary to subtract from the inhomogeneous term its average value over the 
region D. Thus, if Vo is the volume of the region D, then 

V 2 w = /(x) - / where f = ^- f f ⑻ d m x 

Vd Jd 

ensures that the Neumann BVP is solvable. In particular, the inhomogeneous term 
for the Green’s function is not simply 8(x-y) but5(x 一 y)— 占， where 

為 =▲ = ^ iSy € D. 




620 22. MULTIDIMENSIONAL GREEN'S FUNCTIONS: APPLICATIONS 


interior vs exterior 
BVP 


Thus, the Green’s function for the Neumann BVP, Gjv(x, y), satisfies 
V 2 Gat(x, y) = 5(x-y) - 

VD 

( x ，y) = 0 for x € 3D. 
dn 

Applying Green’s identity, Equation (21.27), we get 

m(x) = j d m yG N (x,y)f(y) - ^G N (x,y)^ da u f (22.9) 

where u = (1/Vd) Jd w(x) d m x is the average value of u in D. Equation (22.9) 
is valid only for the Laplacian operator, although a similar result can be obtained 
for a general self-adjoint SOLPDO with constant coefficients. We will not pursue 
that result, however, since it is of little practical use. 


Carl Gottfried Neumann (1832-1925) was the son of 
Franz Ernst Neumann, a professor of physics and mineralogy at 
Konigsberg; his mother, Luise Florentine Hagen, was a sister- 
in-law of the astronomer Bessel. Neumann received his primary 
and secondary education in Konigsberg, attended the university, 
and formed particularly close friendships with the analyst F. J. 

Richelot and the geometer L. O. Hesse. After passing the examina¬ 
tion for secondary-school teaching, he obtained his doctorate in 
1855; in 1858 he qualified for lecturing in mathematics at Halle, 
where he became Privatdozent and，in 1863, assistant professor. 

In the latter year he was called to Basel, and in 1865 to Tubingen. From the autumn of 
1868 until his retirement in 1911 he was at the University of Leipzig. In 1864 he married 
Hermine Mathilde Elise Kloss; she died in 1875. 

Neumann, who led a quiet life, was a successful university teacher and a productive 
researcher. More than two generations of future gymnasium teachers received their basic 
mathematical education from him. As a researcher he was especially prominent in the field 
of potential theory. His investigations into boundary value problems resulted in pioneering 
achievements; in 1870 he began to develop the method of the arithmetical mean for their 
solution. He also coined the term “logarithmic potential,” The second boundary value prob¬ 
lem of potential theory still bears his name; a generalization of it was later provided by H. 
Poincare. 

Neumann was a member of the Berlin Academy and the Societies of Gottingen, Munich, 
and Leipzig- He performed a valuable service in founding and editing the important German 
mathematics periodical Mathematische Annalen. 



Throughout the discussion so far we have assumed that D is bounded; that is, 
we have considered points inside D with BCs on the boundary 9D specified. This 
is called an interior BVP. In many physical situations we are interested in points 







22.2 PARABOLIC EQUATIONS 621 


outside D. We are then dealing with an exterior BVP. In dealing with such a 
problem, we must specify the behavior of the Green’s function at infinity. In most 
cases, the physics of the problem dictates such behavior. For instance, for the case 
of an exterior Dirichlet BVP, where 

m(x) = f d m yG D (^, y)/(y) + [ yb) da 

and it is desired that w(x) — 0 as |x| — oo, the vanishing of Gd(x, y) at infinity 
guarantees that the second integral vanishes, as long as OD is a finite hypersurface. 
To guarantee the disappearance of the first integral, we must demand that (x:，y) 
tend to zero faster than f(y)d m y tends to infinity. For most cases of physical inter¬ 
est, the calculation of the exterior Green’s functions is not conceptually different 
from that of the interior ones. However., the algebra maybe more involved. 

Later we will develop general methods for finding the Green’s functions for 
certain partial differential operators that satisfy appropriate BCs. At this point, let 
us simply mention what are called mixed BCs for elliptic PDEs. A general mixed 
BC is of the form 

a(x)u(x) + ⑻ = y(x). (22.10) 

an 

Problem 22.6 examines the conditions that the GF must satisfy in such a case. 


22.2 Parabolic Equations 

Elliptic partial differential equations arise in static problems, where the solution 
is independent of time. Of the two major time-dependent equations, the wave 
equation and the heat (or diffusion) equation, 2 the latter is a parabolic PDE and 
the former a hyperbolic PDE. This section examines the heat equation, which is 
of the form V 2 w = a 2 du/dt. By changing t to t/a 1 , we can write the equation 
as L Xi tW = (S/St — V 2 )w(x, t) = Oi We wish to calculate the Green’s function 
associated with L x 〆 and the homogeneous BCs. Because of the time variable, we 
must also specify the solution at ^ = 0. Thus, we consider the BVP 

L x ,/[m] 三(备 - V 2 ) w(x ，0 = 0 for xe D, 

u(Xb, t) = 0, u(x, 0) == h(x) for e dD, xe D. (22.11) 

To find a solution to (22.11)，we can use a method that turns out to be useful 
for evaluating Green’s functions in general ― the method of eigenfunctions. Let 


2 The heat equation turns into the Schrddinger equation if f is changed to yf-Tt; thus, the following discussion incorporates 
the Schrodinger equation as well. 




22. MULTIDIMENSIONAL GREEN’S FUNCTIONS: APPLICATIONS 


be the eigenfunctions of V 2 with eigenvalues {—Let the BC be 
u n (Xb) = Oforxft e dD. Then 


▽ 2 M n (x) + = 0 for « = 1 ， 2, • • ■ ， xg D 9 

u n (Xb) = 0 for \b € 3D. (22.12) 

Equation (22.12) constitutes a Sturm-Liouville problem in m dimensions, which 
we assume to have a solution with {u n Y^L^ as a complete orthonormal set. We can 
therefore write 


00 


u(x, t)= 


(22.13) 


W ： 


This is possible because at each specific value of t 9 m(x ， /) is a function of x and 
therefore can be written as a linear combination of the same set, {un}^. The 
coefficients C n (t) are given by 


C n (t) 


u(x, t)u n (x)d m x. 


(22.14) 


D 


To calculate C n (t), we differentiate (22.14) with respect to time and use (22.11) 
to obtain 


Cnif) 


dC n 

dt 


/ —OM)% ⑻， / [V 2 m(x ， 0“x) ， x. 
▼D 初 JD 


Using Green’s identity for the operator V 2 yields 


/ [u n V 2 u — uV 2 u n ] d m x = I 
'd JdD 


/ du 
\ Un dn 


u 3 -^) da, 
dn ) 


Since both u and u n vanish on 3D, the RHS is zero, and we get 

C n (t)= I uV 2 u n d m x = -X n I m(x, t)u n (x)d m x — -KtC n . 
jd Jd 

This has the solution C n (t) = ⑼ e— 入 ,lf ， where 


C“0) 
so that 


=u(y,0)u n (y)d m y= / h(y)u n (y)d m y, 
Jd Jd 


C n [t) = e~ knt / h(y)u n (y) d m y. 

Jd 

Substituting this in (22.13) and switching the order of integration and summation, 
we get 


u(x, t) 


D 


00 


Y^e~ Xnt u n {x)u n {y) 


Ln- 


Hy)d m y 



22.2 PARABOLIC EQUATIONS 623 


and read off the GF as e~ knt u n (x)u n (y)6(t), where we also introduced the 
theta function to ensure that the solution vanishes for / < 0. More generally, we 
have 


OO 

G(x, y ； t-r) = ^2 e~ knit ~ r) u n (x)u n (y)G(t - r). 

n=l 

Note the property 

OO 

lim G(x, y;t-z) = T]u n (x)u n (y) ^ 5(x-y), 

■C—t 

rt=\ 

which is usually written as 

G(x ， y;0+) = 8(x - y). 

The reader may also check that 

LvG(x ， y;t -r). = 8(x- y)S(t - r) 

This is precisely what we expect for the Green’s function of an operator in the 
variables x and 夂 Another property of G(x ， y; r — r) is that it vanishes on bD y as 
it should. 

Having found the Green’s function and noted its properties, we are in a position 
to solve the inhomogeneous analogue of Equation (22.11), in which the RHS of 
the first equation is /(x, t) 9 and the zero on the RHS of the second equation is 
replaced by g(x^, t). Experience with similar but simpler problems indicates that 
to make any progress toward a solution, we must come up with a form of Green’s 
identity involving L Xi t and its adjoint. It is easy to show that 

t d 

^L x ,f[«] — = —(wv) — V - (vVm — mVu), (22.18) 

ot 

where L: f = —3/3 卜 V 2 . 

Now consider the (m + 1)-dimensional “cylinder” one of whose bases is at 
t — e, where 6 is a small positive number. This base is barely above the m- 
dimensional hyperplane E w . The other base is at / = t — 6 and is a duplicate of 
D cM. m (see Figure 22.1). Let a' where /x = 0,1 ， ■ • •, m ， be the components of 
an (m +1)-dimensional vector a = (a 0 , a 1 ,..., a m ). Define an inner product by 

m 

a • b = = a°b° — a l b l - a m b m = — a • b 

/i=0 


(22.15) 

(22.16) 

(22.17) 


and the (m -h 1)-dimensional vector Q by <2° = uv, Q = vVu — mVv. Then 
(22.18) can be expressed as 


m 

- uLl >t [v] = ^ 

fi=0 


bq^ _ dQ° 


dQ 1 

ax 5 " 


BQ m 


(22.19) 


624 22. MULTIDIMENSIONAL GREEN’S FUNCTIONS: APPLICATIONS 



Figure 22.1 The “cylinder” used in evaluating the GF for the diffusion and wave equa¬ 
tions. Note that the bases are not planes, but hypeiplanes (that is, spaces such as M m ). 

We recognize the RHS as a divergence in (m + l)-dimensional space. Denoting 
the volume of the (m+1)-dimensional cylinder by D and its boundary by dD and 
integrating (22.19) over D, we obtain 

f - f £ gd 时 1 x 

^ dx 

疒 m 

= / Q^n^dS, 

fi=o ( 22 . 20 ) 

where dS is an element of “area” of 3D. Note that the divergence theorem was 
used in the last step. The LHS is an integration over (and x，which can be written 
as 


f (vL Xyt [u] - uil >t [v])d m ^ l x = 
JT) 




d m x(vL Xit [u] - uLl t [v]). 


The RHS of (22.20), on the other hand, can be split into three parts: a base at 
t = e,di base at f = r — 6, and the lateral surface. The base at r = e is simply 
the region D, whose outward-pointing normal is in the negative t direction. Thus, 
no — —1, and ti[ = 0 for i — 1,2,... } w. The base at ^ = r — € is also the region 
D\ however, its normal is in the positive t direction. Thus, rao = 1， and n\ = 0 
for i = 1,2, • • • ， m. The element of “area” for these two bases is simply d m x. 
The unit normal to the lateral surface has no time component and is simply the 
unit normal to the boundary of D. The element of “area” for the lateral surface is 
dt da ，where da is an element of “area” for dD. Putting everything together, we 



22.2 PARABOLIC EQUATIONS 625 


can write (22.20) as 


•T— 芒 


dt f d m x(vL Xtt [u] - uLl t [v]) 

Jd 

= f (-Q°)\ t=€ d m x^ f Q°\ t=T ^d m x- f da f dtQ e n . 
Jd Jd JdD Je 

The minus sign for the last term is due to the definition of the inner product. 
Substituting for Q yields 


•r—€ 


dt I d m x(vL x> t[u] — mL^[u]) 
Jd 


=—I m(x, 6)u(x, €)d x + f m(x, r — 6)u(x, r — €)d m x 

Jd Jd ( 22 . 21 ) 

_L da L 

Let v be g(x, y;t — r), the GF associated with the adjoint operator. Then 
Equation (22.21) gives 

L dt y; t - r)/(x, 0 -«(x, t)8(x- y)S(t - r)] 

=- / m(x, €)g(x, y\€ - z)d m x + / m(x, t - e)g(x, y; -e)d m x 
Jd Jd 

pX—€ 

da 

L dn an」 (22.22) 


乂令 ，… 《-咖 ，0 砮: 


We now use the following facts: 

1 ■ <5(r — r) = 0 in the second integral on the LHSof Equation (22.22), because 
t can never be equal to r in the range of integration. 

2. Using the symmetry property of the Green’s function and the fact that L x 〆 
is real, we have g(x， y\t — x) = G(y, x; r —t), where we have used the fact 
that f and r are the time components of x and y, respectively. In particular, 
by (22.16)，g(x，y; -e) = G(y, x; €) = 8(x-y). 

3. The function g (x，y; (—r) satisfies the same homogeneous BC asG(x, y; t— 
x). Thus, g(xb 7 y；t — r) = Ofor G 3D. 

Substituting all the above in (22.22), taking the limit € ^ 0, and switching x 
and y and t and r, we obtain 

w(x, 0 = / dx f d m yG(x, y; t - r)/(y, t) + / u(y, 0)G(x, y; t)d m y 
Jo Jd Jd 

-f dr f u(yb, -r) da, (22.23) 

JO JdD ^ n y 




626 22. MULTIDIMENSIONAL GREENS FUNCTIONS: APPLICATIONS 


where d/dn y in the last integral means normal differentiation with respect to the 
second argument of the Green’s function. 

Equation (22.23) gives the complete solution to the BVP associated with a 
parabolic PDE. If /(y, r) — 0 and u vanishes on the hypersurface 9D, then 
Equation (22.23) gives 

w(x, t)= f u(y, 0)G(x,y; t)d m y, (22.24) 

Jd 

GPas evolution which is the solution to the BVP of Equation (22.11), which led to the general 
operator or Green’s function of (22.15). Equation (22.24) lends itself nicely to a physical 
propagator interpretation. The RHS can be thought of as an integral operator with kernel 
G(x, y; 0. This integral operator acts on w(y ， 0) and gives m(x, 0 ； that is, given 
the shape of the solution at ( = 0, the integral operator produces the shape for 
all subsequent time. That is why G(x, y; t) is called the evolution operator ， or 
propagator. 


22.3 Hyperbolic Equations 

The hyperbolic equation we will discuss is the wave equation 

L x» •- V 2 j w(x, /) = 0, (22.25) 

where we have set the speed of the wave equal to unity. 

We wish to calculate the Green’s function for L Xtt subject to appropriate BCs. 
Let us proceed as we did for the parabolic equation and write 

oo 广 

G(x, y ； 0 = V c n (y; t)u n (x) C n (r ， t)= / G(x ， y;0w»Wd m x ， 

Jd (22.26) 

where w n (x) are orthonormal eigenfunctions of V 2 with eigenvalues —X n , satisfy¬ 
ing certain, as yet unspecified, BCs. As usual, we expect G to satisfy 

Lx,f[G] = — V 2 ) G(x ， y; / — r) = 5(x — y)S(t — r). (22.27) 

Substituting (22.26) in (22.27) with r = 0 and using V 2 u n = -X n a n7 gives 

OO r ^2 OO^ 

H + 入 0 如 ⑻ =L[ 〜 (y)5(/)]M„(x )， 

«=1 n=l 

where we used S(x- y) = Yl^=i M«(x)w«(y) on the RHS. The orthonormality 
of u n now gives C„(y; t) + X tt C«(y;0 = u n (y)8(t). It follows that C„(y; 0 is 


22.3 HYPERBOLIC EQUATIONS 627 


separable. In fact, 

Cn (y ； t) = Un(y)T n (t) where 


d 2 


+ 入 ) 7^(?) = s(ty 


This equation describes a one-dimensional Green’s function and can be solved 
using the methods of Chapter 20. Assuming that T n (t) = 0 for ^ < 0, we obtain 
T n (t) = (smo) n t/a> n )0(t), where Substituting all the above results in 

(22.26), we obtain 


00 


G(x, y; t) = ^M rt (x)w tt (y) 


sino) n t 

⑴ n 




or, more generally, 


OO 


G(x, y;t-r) = ^2u n (x)u n (y) 


sin 叫 (f — t) 

(On 


0(t - T). 


(22.28) 


We note that 
G(x, y; 0+)=0 


and 


dG 


(x, y; t) 


r->0 + 


5(x —y). 


(22.29) 


as can easily be verified. 

With the Green’s function for the operator L x 〆of Equation (22.25) at our 
disposal, we can attack the BVP given by 


a 2 


▽ 2 ) 


du 

Tt 


(x, t) 


/(X, t) 

for xe D, 



for Xb e 3D, x e Z), 



for xe D. 

(22.30) 


As in the case of the parabolic equation, we first derive an appropriate expression 
of Green’s identity. This can be done by noting that 

t d / dv du \ 

vL XJ [u] - uL^ t [v] = — \u— - v —j - V ■ (uVv - vVu). 

Thus, L x>t is formally self-adjoint. Furthermore, we can identify 


3i> 3 m 


and 


Q = uVv — vVw. 



628 22. MULTIDIMENSIONAL GREENS FUNCTIONS: APPLICATIONS 


Following the procedure used for the parabolic case step by step, we can easily 
derive a Green’s identity and show that 


«(x, t) = f dr f d m yG(x, y；t r) 

Jo Jd 

+ y ； /)- 0(y)^(x, y; t)]d m y 



• T) 篆 


(x, y^; t - z) da. 


(22.31) 


The details are left as Problem 22.11. 

For the homogeneous PDE with the homogeneous BC 办 = 0 = 少 ， we get 

w(x ， t) = - ^(y) —(x, y; t)d m y. 

Jd 

Note the difference between this equation and Equation (22.24). Here the prop¬ 
agator is the time derivative of the Green’s function. There is another difference 
between hyperbolic and parabolic equations. When the solution to a parabolic 
equation vanishes on the boundary and is initially zero, and the PDE is homoge¬ 
neous [/(x ， t) = 0], the solution must be zero. This is clear from Equation (22.23). 
On the other hand, Equation (22.31) indicates that under the same circumstance, 
there may be a nonzero solution for a hyperbolic equation if ^ is nonzero. In such 
a case we obtain 

«(x, t)= f ^r(y)G(x,y; t)d m y. 

Jd 

This difference in the two types of equations is due to the fact that hyperbolic 
equations have second-order time derivatives. Thus, the initial shape of a solution 
is not enough to uniquely specify it. The initial velocity profile is also essential. 
We saw examples of this in Chapter 19. The discussion of Green’s functions has 
so far been formal. The main purpose of the remaining sections is to bridge the 
gap between formalism and concrete applications. Several powerful techniques 
are used in obtaining Green’s functions, but we will focus only on two: the Fourier 
transform technique, and the eigenfunction expansion technique. 


22.4 The Fourier TVansform Technique 

Recall that any Green’s function can be written as a sum of a singular part and 
a regular part: G = G s + H. Since we have already discussed homogeneous 
equations in detail in Chapter 19, we will not evaluate H in this section but will 
concentrate on the singular parts of various Green’s functions. 

The BCs play no role in evaluating G s . Therefore, the Fourier transform tech¬ 
nique (FTT), which involves integration over all space, can be utilized. The FTT 


22.4 THE FOURIER TRANSFORM TECHNIQUE 629 


has a drawback — it does not work if the coefficient functions are not constants. 
For most physical applications treated in this book, however, this will not be a 
shortcoming. 

Let us consider the most general SOLPDO with constant coefficients. 


m rs m o 

Lx= , 0+e .^ +e ^_j 


a 2 


j,k=l 


dxj dxk * 


(22.32) 


where oq, aj, and bjk are constants. The corresponding Green’s function has a 
singular part that satisfies the usual PDE with the delta function on the RHS. The 
FTT starts with assuming a Fourier integral representation in the variable x for the 
singular part and for the delta function: 


G“x，y) 
5(x-y) 


1 


(2jr) m / 2 

1 i 


(2tt) 


m 


I ，城 (k ， y ) 沪 ' 

d m ke ik^y) t 


Substituting these equations in the PDE for the GF, we get 
„ 1 ( e~ ik 'y 

GAKy) ^ (2^ Uo + /E7=i^-E- ： /=i^^/ 


and 


仏 (x，y) 




d m k 


e *k_(x — y) 


a o + ^ bjkkjki ( 22 . 33 ) 


If we can evaluate the integral in (22.33), we can find G. 

The following examples apply Equation (22.33) to specific problems. Note 
that (22.33) indicates that G 5 depends only on x - y. This point was mentioned in 
Chapter 20, where it was noted that such dependence occurs when the BCs play 
no part in an evaluation of the singular part of the Green’s function of a DE with 
constant coefficients; and this is exactly the situation here. 


22«4«1 GF for the 讲 -Dimensional Laplacian 

We calculated the GF for the m-dimensional Laplacian in Section 21.2.2 using a 
different 


it method. With oq 
reduces to 


0 = aj, bji = 8ji^ and r = x — y. Equation (22.33) 


G s (r) 


丄 

(27t) m 


/ 


d m k 


e 


/k.r 




(22.34) 


where k 2 = fef + ••• + fc 三 = k ■ k. To integrate (22.34)，we choose spherical 
coordinates in the m-dimensional i-space. Furthermore，to simplify calculations 



22. MULTIDIMENSIONAL GREENS FUNCTIONS: APPLICATIONS 


we let the fc m -axis lie along r so that r = (0, 0, • ■ • ， |r|) and k ■ r = ^|r| cos^i 
[see Equation (21.12)]. Substituting this in (22.34) and writing d m km spherical 
coordinates yields 


G s (r) 


(2tt) w 


fi *A|r|cos0i 




•shi0 m 一 2 dkcWi … dO m -\ 

(22.35) 


From Equation. (21.15) we note that dQ m = (sin 0i) m ~ 2 d0idQ m -\ - Thus，after 
integrating over the angles fc, •. • ， 0 m -\, Equation (22.35) becomes 

1 /»00 / »7T 

= / k m _ 3 dk (sin&r—V 训咖 01 册 . 

( 27 t) m Jo Jo 


The inner integral can be looked up in an integral table (see [Grad 65, p. 482]): 

2 v m/2-l 

(sin 的 : T 一 〜 — |COSP1 洲 i = ^7t[ 


'S ⑽ ) m _ 2 ’ |r|cos01 洲 1 = W d)— 1 r 


Substituting this and (21.16) in the preceding equation and using the result (see 
[Grad 65, p. 684]) 


00 

X th J v {ax) dx = 2 fl a~^ 1 


/ fi + v + 1 

V 2 

，弘 一 v + 1 

l ~2~ ’ 


we obtain 


G s (r) 


r(m/ 2 - 1 ) 

4jjr-m/2 


( 忐 ) 


for m >2, 


which agrees with (21.20) since r(m/2) = (m/2 — l)r(m/2 — 1). 

22.4.2 GF for the m-Dimensional Helmholtz Operator 

For the Helmholtz operator V 2 — jn 2 . Equation (22.33) reduces to 

i /» ^ik-r 


G S (X) 


d m k 


(2n) m J " /x 2 + P • 

Following the same procedure as in the previous subsection, we find 

G s (r) = 一 士匕 1 广 f (sm0i) m - 2 e ikrcos ^d6 1 

J 0 ^ + k 2 Jo 

Q m _l _/2x / »/ 2 - 1 _/m-l\ f°° k m ! 2 


(2 丌 ) 


m 


2 m"—i /m -lx roo k m ^ y 
O r ( 丁 ) i) 



22.4 THE FOURIER TRANSFORM TECHNIQUE 631 


Here we can use the integral formula (see [Grad 65, pp. 686 and 952]) 


广 J v {bx)x v ^ x 
Jo {x 2 4 - a 2 ) 叶 1 


2”r(" + i)D 的 ’ 


where 

to obtain 

G 秦-為 O. 1 r (㈣ ， 

which simplifies to 

⑽ = -(2^ (二 r /2_1 产 ~ s 2 -⑽) • （ 22 - 36 ) 


e~^ r 

It can be shown (see Problem 22.8) that form = 3 this reduces to G s (r) = — , 

which is the Yukawa potential due to a unit charge. 

We can easily obtain the GF for V 2 +/x 2 by substituting ±i 弘 for /z in Equation 
(22.36). The result is 


帥 ） =^(^72 © W/2_1 ( 妙 ). （ 22.37) 

For m = 3 this yields G s (r) = —e ±ifxr /(47tr). The two signs in the exponent 
correspond to the so-called incoming and outgoing “waves.” 

Non-local potentials 
depend not only on 
the observation 
point, but also on 
some other 
“non-local” variables. 

审 (r) = Ae ik r - -Ac [ ^ f d 3 r"V(r’, r") 少 (r"). (22.38) 

27th 2 7r3 加 


22«4.1. Example. Fora non-local potential, the time-independent Schrodinger equation 
is _ 

-^▽ 2 屮 + / 3 y(r, 〆 )*( 〆 ） d?>r， = £ ^( r >* 

Then, the integral equation associated with this differential equation is (see Section 21.4) 


Fora separable potential, for which V(r' ， r") = —g 2 U(r , )U (r 〃）， we can solve (22.38) 
exactly. We substitute for V(r ’， r") in (22.38) to obtain 

II(f 2 r ik\r-r f \ r 

^(r) = Ae ikT + / d\ f - —— -V^) / d 3 r ff U(r 〃 YHr"). (22.39) 

2nh 2 Jr3 Ir-r'l ； r3 



22. MULTIDIMENSIONAL GREEI\l r S FUNCTIONS: APPLICATIONS 


Defining the quantities 

.2 


0(r) 






lizh 2 Jm? |r - 〆 


Cs / d 3 r /f U(r f/ )^(r f/ ) 


(22.40) 


and substituting them in (22.39) yields 少 (r) = Ae^' r -h CQ(r). Multiplying both sides of 
this equation by U (r) and integrating over R 3 , we get 

C = A f ^ T U{v)d 3 r + C f U(x)Q{v)d\ 

J]R3 

=(2jr) 3/2 At/(-k) + c £ 3 V(v)Qix)d\ 

from which we obtain C = {2n)^^AU (—k)/[l — /^3 U (r) Q(r)d^r], leading to the 
solution 


;k . r (2jr) 3 / 2 At/(-k) ~ 、 

帥 ) =Ae + l-/ R 3C/(r0fi(r0^ 3 r /G(r)， 


(22.41) 


In principle, t7(-k) [the Fourier transfonn of C/(r)] and g(r) can be calculated once the 
functional form of U(r) is known. Equations (22.40) and (22.41) give the solution to the 
Schrodinger equation in closed form. — 

When dealing with parabolic and hyperbolic equations, we will find it conve¬ 
nient to consider the “different” variable (usually t) as the zeroth coordinate. In 
the Fourier transfonn we then usecw = — 灸 o and write 


G s (r ， t) 


5(r 剛 


1 


'OO 




dco / d m kG s (k,co)e i()Lr - Q}t \ 


oo 


■00 


dco / d m ke 办， r 一气 


J-oo w 

where r is the m-dimensional position vector. 


(22.42) 


22.4.3 GF for the m-Dimensional Diffusion Operator 

We substitute from (22.42) in (d/dt — V 2 )G 5 (r, t) = S(r)S(t) to obtain 


G s (r ， t)= 


_oo 


(2兀 尸 + 1 


d m ke ikT / dco 


e 


■iwt 


-oo 


CO 


+ 舻 ’ 


(22.43) 


where as usual, k 2 = kf. The co integration can be done using the calculus 

of residues. The integrand has a simple pole at co = —ik 2 , that is, in the lower 
half of the complex 出 -plane (LHP). To integrate, we must know the sign of t. If 
r > 0, the exponential factor dictates that the contour be closed in the LHP, where 
there is a pole and, therefore, a contribution to the residues. On the other hand, if 
t < 0, the contour must be closed in the UHP. The integral is then zero because 



22.4 THE FOURIER TRANSFORM TECHNIQUE 633 


there are no poles in the upper half plane (UHP). We must therefore introduce a 
step function 0(0 in the Green’s function. Evaluating the residue, the cy integration 

yields —2nie- k \ (The minus sign arises because of clockwise contour integration 
in the LHP.) Substituting this in Equation (22.43)，using spherical coordinates in 
which the last 众 -axis is along r, and integrating overall angles except B\, we obtain 


^(r, t) = 0(t) 


Oit) 


Q 


m—l 


^ g ) m / 2 _ 1 r (^) /。'吟〜一刺此 


卩 m-l 
(27t) m 


For the integration, we used the result quoted in Section 22.4.1. 
Using the integral formula (see [Grad 65, pp. 716 and 1058]) 

7A + V + 1\ 


■00 


X^€~ aX J v (fix)dx 


泠 1 T (: 




2 y + 1 a^+ v + 1 )/ 2 r(v + 1) ' V 2 

where 伞 is the confluent hypergeometric function, we obtain 

27r (m-l)/2 /2 、m/2 - 1 r m/2-l / m m r 2 

, 2 , 2 5 At 


y ， 


Gs(r, t) = G(t) 


(2n) m 


2m/2 t m/2 


)• 


( 22 . 44 ) 


The power-series expansion for the confluent hypergeometric function shows 
that z) = e z . Substituting this result in (22.44) and simplifying, we finally 

obtain 


G s (r, t) 


e 


• 2 /4f 




0(t). 


(22.45) 


22.4.4 GF for the m-Dimensional Wave Equation 

The difference between this example and the preceding one is that here the time 
derivative is of second order. Thus, instead of Equation (22.43), we start with 


G s (r, t) 


1 


k OO 


e 


<^j dmkeik . r L〜-f 


(22.46) 


The^ integration can be done using the method of residues. Since the singularities 
of the integrand, a> = 士 A：，are on the real axis, it seems reasonable to use the 
principal value as the value of the integral. This, in turn, depends on the sign of 
If f > 0 (r < 0), we have to close the contour in the LHP (UHP): to avoid the 
explosion of the exponential. If one also insists on not including the poles inside 
the contour, 3 then one can show that 


'00 


p 


do) 


e 


-icot 


■oo 


CO 2 — k 2 


sin h 、 
.兀丁他 


3 This will determine howto (semicircle around the poles. 


22. MULTIDIWIENSIONALGREEN’S FUNCTIONS: APPLICATIONS 


where 


Physics determines 
the contour of 
integration 


6(0 = 9(t) — 0(—t) =1 

— 1 


if / > 0, 
if / < 0. 


Substituting this in (22.46) and integrating over all angles as done in the previous 
examples yields 


G,(r, t) 


2(2jt 严 / 2 r m / 2 . 


k m/2 ~ l J m/2 -i(kr) sinktdk. (22.47) 


As Problem 22.25 shows, the Green’s function given by Equation (22.47) 
satisfies only the homogeneous wave equation with no delta function on the RHS. 
The reason for this is that the principal value of an integral chooses a specific 
contour that may not reflect the physical situation. In fact, the Green’s function 
in (22.47) contains two pieces corresponding to the two different contours of 
integration, and it turns out that the physically interesting Green’s functions are 
obtained, not from the principal value, but from giving small imaginary parts to 
the poles. Thus, replacing thectJ integral with a contour integral for which the two 
poles are pushed in the LHP and using the method of residues, we obtain 


retarded Green's 
function 


advanced Green’s 
function 


1 叩三 L d0) ^^ = k z^¥ dz = T m ^ kt - 

The integral is zero for ^ < 0 because for negative values of /, the contour must be 
closed in the UHP, where there are no poles inside C\. Substituting this in (22.46) 
and working through as before, we obtain what is called the retarded Green’s 
function: 




Q(t) r 

(2 丌 ) m/2 r m/2-l J 0 


kf 71 ^ 2-1 J m / 2 -i(kr) sin to dk. 


(22.48) 


If the poles are pushed in the UHP we obtain the advanced Green’s function: 


G 产 )(r“) 


0(~t) f 


J m j 2 -\{kr) sin kt dk. 


(22.49) 


Unlike the elliptic and parabolic equations discussed earlier, the integral over/: 
is not a function but a distribution, as will become clear below. To find the retarded 
and advanced Green’s functions, we write the sine term in the integral in terms of 
exponentials and use the following (see [Grad 65, p. 712]): 


J v (^x) dx 


(2^yr(v+l/2) 
-H /9 2 ) v+1 / 2 


for Re(a) > |Im()3)|. 




22.4 THE FOURIER TRANSFORM TECHNIQUE 635 


To ensure convergence at infinity, we add a small negative number to the exponen¬ 
tial and define the integral 


4 士 


'OO 


k v e-^ e)k Mkr)dk= ( 2r ) Vr (^_+ 1 / 2 ) [ ( 干 " + + r 2]-(v+i/2) 

V7T 


For the GFs, we need to evaluate the (common) integral in (22.48) and (22.49). 
With v = m/2 — 1, we have 


! ⑼三 f 0 k v J v (kr) sinkt dk = - 普 - 1 € 一 ) 


(2r) v r(v + 1/2) 


I [r 2 + (-it + e )2]v+l/2 - [r 2 + (it + e) 2]v+l/2 J ’ 

At this point, it is convenient to discuss separately the two cases of m odd and 
m even. Let us derive the expression for oddm (the even case is left for Problem 
22.26). Define the integer n ^ (m — 1)/2 = v + \ and write as 


/⑻ = ^n-^nn) Hm r 
2i^fn €^0 1 


_ 1_ 1 ] 

2iv^ 6^0 I [r 2 + (-i7-h €) 2 ] n [r 2 + (it + e) 2 ] n I' 

Define m = r 2 + (-it + e) 2 . Then using the identity 


(22.50) 


(-If 1 d n 


1 


u n (n — 1)! du n ~ x \u 
and the chain rule, df/du = (l/2r)3//3r, we obtain d/du = (l/2r)3/9r and 


1 / 1 a' 

-Vi! V 2r 9r- 


n- 


[r 2 + (±it + €) 2 ] n ~ (n- 1)! V 2r 3r 
Therefore, Equation (22.50) can be written as 

POO 

I ⑻ =I J n -\j 2 {kr) sin kt dk 


Lr 2 + (±it + €) 2 - 


(2r) n - l f 2 r(n) 1 

2i^ (n — 1)! 

■(一丄±疒卜 

V 2r drj 1 


>0 


L [r 2 + (-it + e) 2 ] [r 2 + (it+€) 2 ] 


]}• 


(22.51) 


The limit in (22.51) is found in Problem 22.27. Using the result of that problem 
and r ⑻ =(n — 1)!, we get 

fOO 

/(«) _ J k n ~^^ 2 J n -\ sin/:/ dk 




2 


(22.52) 



636 22. MULTIDIMENSIONAL GREEN'S FUNCTIONS: APPLICATIONS 


Employing this result in (22.48) and (22.49) yields 


4 ret) 

-(adv) 


(r，0 


丄( 

i a \ n ~ 

-1 


for n 

4tt \ 

2itr dr) 


L r J 

丄( 

1 d \ n - 

~1 


for n 

47T \ 

27tr dr) 


[r J 


m 


2 


m 


2 


(22.53) 


The theta functions are not needed in (22.53) because the arguments of the delta 
functions already meet the restrictions imposed by the theta functions. 

The two functions in (22.53) have an interesting physical interpretation. 
Green’s functions are propagators (of signals of some sort), and Gj ret) (r, is capa¬ 
ble of propagating signals only for positive times. On the other hand, t) 

can propagate only in the negative time direction. Thus, if initially (f = 0) a sig¬ 
nal is produced (by appropriate BCs), both G^ ret ) (r ， f) and t) work to 

propagate it in their respective time directions. It may seem that G^ adv )(r ， /) is 
useless because every signal propagates forward in time. This is true, however, 
Feynman propagator only for classical events. In relativistic quantum field theory antiparticles are inter¬ 
preted mathematically as moving in the negative time direction! Thus, we cannot 
simply ignore G^ adv) (r, t). In fact，the correct propagator to choose in this the¬ 
ory is a linear combination of Gj adv \r, t) and Gl ret \r 9 1), called the Feynman 
propagator (see [Wein 95, pp. 274-280]). The foregoing example shows a subtle 
difference between Green’s functions for second-order differential operators in 
one dimension and in higher dimensions. We saw in Chapter 20 that the former are 
continuous functions in the interval on which they are defined. Here, we see that 
higher dimensional Green’s functions are not only discontinuous, but that they are 
not twen functions in the ordinary sense; they contain a* delta function. Thus, in 
general, Green’s functions in higher dimensions ought to be treated as distributions 
(generalized functions). 

22.5 The Eigenfunction Expansion Technique 

Suppose that the differential operator L x , defined in a domain D with boundary 3 D, 
has discrete eigenvalues with corresponding orthonormal eigenfunctions 

{w w (x)}^ =1 . These two sets may not be in one-to-one correspondence. Assume 
that the m w (x)’s satisfy the same BCs as the Green’s function to be defined below. 

Now consider the operator L x — 入 1, where 入 is different from all k n Then, 
as in the one-dimensional case, this operator is invertible, and we can define its 
Green’s function by (L x - 入 ) G^(x, y) = S(x — y) where the weight function is set 
equal to one. The completeness of {m„(x)}^ = 1 implies that 


00 OO 

8(x - y) = ^ M tt (x)w*(y) and Ga(x, y) = Yl an ( 咖 《 ( x )• 



22.5 THE EIGENFUWCTION EXPAWSION TECHNIQUE 637 


Substituting these two expansions in the differential equation for GF yields 

OO 00 

E ( 入 „ - k)a n (y)u n (x) = ^M„(x)w*(y), 
n=l n=l 

The orthonormality of the %’s gives a n (y) = u*(y)/(k n ― X). Therefore, 

n y 、 M n(x)wJ(y) no - 

G 入 (x ， y)= > ——— —. (22.54) 

入《-入 

In particular, if zero is notan eigenvalue of L x , its Green’s function can be written 
as 

G(x, y) = f ； (22.55) 

This is an expansion of the Green’s function in terms of the eigenfunctions of L x . 

It is instructive to consider a formal interpretation of Equation (22,55). Recall 
that the spectral decomposition theorem permits us to write /(A) = f (M)Pi 

for an operator A with (distinct) eigenvalues Xi and projection operators P；. Allow¬ 
ing repetition of eigenvalues in the sum, we may write /(A) = f(X n ) \u n ) (u n \ 9 

where n counts all the eigenfunctions corresponding to eigenvalues. Now, let 
/(A) = A -1 . Then 

G = A- 1 = \u n ){u n \ = J2 lUn) {Un K 

n n An 

or, in “matrix element” form, 

〜、 ,i \ ^ x l u n) i u n\y) w «( x ) M n(y) 

G(x, y) = (x| G |y> = > --- = 2^ - i -- 

n n ■人 《 

This last expression coincides with the RHS of Equation (22.55). 

Equations (22.54) and (22.55) demand that the u n (x) form a complete dis¬ 
crete orthonormal set. We encountered many examples of such eigenfunctions 
in discussing Sturm-Liouville systems in Chapter 19. All the S-L systems there 
were, of course, one-dimensional. Here we are generalizing the S-L system to m 
dimensions. This is not a limitation, however, because—for the PDEs of interest — 
the separation of variables reduces an m-dimensional PDE to m one-dimensional 
ODEs. If the BCs are appropriate, the m ODEs will all be S-L systems. A review 
of Chapter 19 will reveal that homogeneous BCs always lead to S-L systems. In 
fact, Theorem 19.1.1 guarantees this claim. Since Green's functions must also sat- 
isfy homogeneous BCs, expansions such as those of (22.54) and (22.55) become 
possible. 


22. MULTIDIMENSIONAL GREEN'S FUNCTIONS: APPLICATIONS 


22.5.1* Example. As a concrete example, let us obtain an eigenfunction expansion of 
the GF of the two-dimensional Laplacian, 4 - inside the rectangular 

region 0 < ^ < a^O < y < b with Dirichlet BCs. Since the GF vanishes at the boundary, the 
eigenvalue problem becomes V 2 m = Xu with m(0, y) = «(a, 3 ?) = 0) = u{x, b) =0. 

The method of separation of variables gives the orthonormal eigenfunctions 4 

y) = -^=sin (-~ x ) sin for = 1 , 2 , 

whose corresponding eigenvalues are X mn = — [(〒)+ . 

Inserting the eigenfunctions and the eigenvalues in Equation (22.55), we obtain 

• G(r ， r ， ) = G(w '， )= g ⑽办 y)W’〆) 

m , n =\ kmn 

.(nn \ . /mn \ . /nn , /mn a 
一 4 ^ sm(- X )sm(—y) sin (~x ) sm (—y ) 


m,n=l 


« 丌 \2 /mjr\2 


(?) 


where we changed x to r and y to〆• Note that the eigenvalues are never zero; thus, C?(r ， 〆 ） 
is well-defined. 圖 


In the preceding example, zero was not an eigenvalue of L x . This condition 
must hold when a Green’s function is expanded in terms of eigenfunctions. In 
physical applications, certain conditions (which have nothing to do with the BCs) 
exclude the zero eigenvalue automatically when they are applied to the Green’s 
function. For instance, the condition that the Green’s function remain finite at the 
origin is severe enough to exclude the zero eigenvalue. 

22.5.2. Example. Let us consider the two-dimensional Dirichlet BVP V 2 m = /, with 
u =： 0 on a circle of radius a. If we consider only the BCs and ask whether zero is an 
eigenvalue of V 2 , the answer will be yes, as the following argument shows. 

The most general solution to the zero-eigenvalue equation, V 2 u = 0, in polar coordi¬ 
nates can be obtained by the method of separation of variables: 

00 00 

u(p, (p) = A B \n p y^(b n p n - b f n p~ n ) cosn(p + ^{c n p n + c f n p~ n ) sin«^. 

n=l n=\ (22.56) 

Invoking the BC gives 


00 00 

0 = u(a ， (p 、= A ^-B\na 4- ^2(b n a n -\-b f n a~ n )co^n<p^{c n a n + c r n a~ n )smn(p, 

n=l n—l 

which holds for arbitrary (p if and only if 

A = —Blna, b f n = —b n a 2n , c' n = -c n a 2n . 

The inner product is defined as a double integral over the rectangle. 



22.5 THE EIGENFUNCTION EXPANSION TECHNIQUE 639 


Substituting in (22.56) gives 

«(p ， f) — B\n f-j + I jQ ft -— I (b n cosnp + c n sinn^). (22.57) 

Thus, if we demand nothing beyond the BCs, V 2 will have a nontrivial eigen-solution 
corresponding to the zero eigenvalue, given by Equation (22.57). 

Physical reality, however, demands that u(p, <p) be well-behaved at the origin. This 
condition sets B t b f n , and c f n of Equation (22.56) equal to zero. The BCs then make the 
remaining coefficients in (22.56) vanish. Thus, the demand that u(p, <p) be well-behaved 
at p = 0 turns the situation completely around and ensures the nonexistence of a zero 
eigenvalue for the Laplacian, which in turn guarantees the existence of a GR 圔 

In many cases the operator L x as a whole is not amenable to a full Sturm- 
Liouville treatment, and as such will not yield orthonormal eigenvectors in terms 
of which the GF can be expanded. However, it may happen that L x can be broken 
up into two pieces one of which is an S-L operator. In such a case, the GF can 
be found as follows: Suppose that Li and L 2 are two commuting operators with 
L 2 an S-L operator whose eigenvalues and eigenfunctions are known. Since L 2 
commutes with Li ， it can be regarded as a constant as far as operations with (and 
on) Li are concerned. In particular, (Li + 1_2)(3 = 1 can be regarded as an operator 
equation in Li alone with L 2 treated as a constant. Let xi denote the subset of the 
variables on which Li acts, and let X 2 denote the remainder of the coordinates. 
Then we can write 5(x — y) = 8 (x\ — yi)<5(x2 — y 2 ). Nowlet Gi(xi, yi ； k) denote 
the Green’s function for Li + k, where A: is a constant. Then it is easily verified 
that 

G(x, y) = Gi(xi,yi ； L 2 >^(x 2 - yi). (22.58) 

In fact, 

(Li + L 2 >G(x, y) ― [(Li 4- I» 2 )Gi(xi ， yi; L 2 )] 8 (x 2 - yi)-. 

K. -i 

v- 

=5(xi—yi) by definition of G\ 

Once G\ is found as a function of L 2 , it can operate on 5(X2 — yi) to yield the 
desired Green’s function. The following example illustrates the technique. 

22.5.3* Example. Let us evaluate the Dirichlet GF for the two-dimensional Helmholtz 
operator V 2 — Z: 2 in the infinite strip 0 < x < a y —00 < >> < 00 . Let Li = d 2 /dy 2 — k 2 
and L 2 = d 2 /dx 2 . Then, 

G(r ， r’) = G(x,x\y,y f ) = G\{y,y f \ l_2)5(x-x ’)， 

where (d 2 /dy 2 — = S(y — /), fi 2 ^ k 2, — L 2 , and G\(y = — 00 ) = = 

00 ) = 0. The GF G\ can be readily found (see Problem 20.12): 

e -fi]y-y f \ e -y/k 2 -L 2 \y-y f [ 

CW; L 2) = —^ = 一飞廢 ^- 


640 22. MULTIDIMENSIONAL GREENS FUNCTIONS: APPLICATIONS 


The full GF is then 


G(r ， 〆) 




6(x —x f ). 


(22.59) 


The operator L 2 constitutes an S-L system with eigenvalues X n = — {nnx/a)^ and eigen¬ 
functions u n {x) = ^Jlfa sm{n7tx/a) where n — 1 ， 2, _Therefore, the delta function 

5 ( 文一 x f ) can be expanded in terms of these eigenfunctions: 


5(^: — x f ) — - ^ sin sin . 


As /z = y/k 2 — L 2 acts on the delta function, L 2 operates on the first factor in the above 
expansion and gives X n . Thus, L 2 in Equation (22.59) can be replaced by -{nnx/a) 2 , and 
we have 


G(r, r f ) 


^9. ( e —\!k 2 -\-(nnx/a) 2 \y-y , | 


f k 2 + {rinx/a) 2 


sin (^x) sin (^). 


Sometimes it is convenient to break an operator into more than two parts. In 
fact, in some cases it may be advantageous to define a set of commuting self- 
adjoint (differential) operators {M y } such that the full operator L can be written 
as L = J2j where the differential operators {Lj} act on variables on which 
the My have no action. Since the Mj’s commute among themselves, one can find 
simultaneous eigenfunctions for all of them. Then one expands part of the delta 
function in terms of these eigenfunctions in the hope that the ensuing problem 
becomes more manageable. The best way to appreciate this approach is via an 
example. 

22.5.4. Example. Let us consider the Laplacian in spherical coordinates, 

1 9 / o3u\ 1 r d / . 3u\ d 2 ul 


1 d ( 2 du \ 1 d / . d^u 

Vu = r^Tr\ r ^)^7^e[^oV m0 so) + ^ 


If we introduce 


Ni\u = u, 


1 「土 

sin 沒 BO 


«) + 乌 1 ， 

v ⑽厂 a/J 




\- 2 u = 


(22.60) 


the Laplacian becomes V 2 =： LiMj + L 2 M 2 . The mutual eigenfunctions of Mj and M 2 are 
simply those of M 2 , which is (the negative of) the angular momentum operator discussed in 
Chapter 12, whose eigenfunctions are the spherical harmonics. We thus have M 2 Yi m (0, (p)= 

Let us expand the Green’s function in terms of the spherical harmonics: 

G ( r ， 〆）=I 幻 m(r; 〆 ， V ， (p f )Yi m {6,(p), 



22,6 PROBLEMS 641 


We also write the delta function as 



S(r - r f )8(0 - 6 f )8((p - <p f ) 
sin 9 f 


，，邮 W ， A , 

r 


where we have used the completeness of the spherical harmonics. Substituting all of the 
above in V 2 G(r,〆 ） = 5(r — r ')， we obtain 


▽ 2 G(r ， 〆）=(LiMi + L 2 M 2 ) [ gim(r\r\0\ ip f )Yi m (0,(p) 

= ^{[(Li - 1(1 + l)L 2 ]g /m (r; 〆 ， 〆 ， <p’)}Y lm ^ ， (p) 

lytft 

= 8{r ~ ?：/)) E ❹' 〆)ww). 

r l,m 

The orthogonality of the Yi m (G, (p) yields 

[(Li - 1(1 + l)L 2 ]g, m (r; 〆 ， 〆 ， 〆)= 3(r 7 2 r 」 y 4(〆 ， 办 

This shows that the angular part of g im is simply Y^ m (9 f , tp f ). Separating this from the 
dependence on r and r’ and substituting for Lj and Lj, we obtain 


5(r-/) 


d /_2 dgi m \ 1(1 + 1 ) 

~ ir - ) - r 2 


r 2 dr V dr 


(22.61) 


where this last gi m is a function of r and 〆 only. The techniques of Chapter 20 can be 
employed to solve Equation (22.61) (see Problem 22.29). _ 


The separation of the full operator into two “smaller” operators can also be used 
for cases in which both operators have eigenvalues and eigenvectors. The result 
of such an approach will, of course, be equivalent to the eigenfunction-expansion 
approach. However, there will be an arbitrariness in the operator approach: Which 
operator are we to choose as our Li? While in Example 22.5.3 the choice was 
clear (the operator that had no eigenfunctions), here either operator can be chosen 
as Li. The ensuing GFs will be equivalent, and the series representing them will 
be convergent, of course. However, the rate of convergence may be different for 
the two. It turns out, for example, that if we are interested in G{x, y\x\y) for 
the two-dimensional Laplacian at points (x, y) whose ^-coordinates are far from 
y，then the appropriate expansion is obtained by letting Li = d 2 /dy 2 , that is, an 
expansion in terms of x eigenfunctions. On the other hand, if the Green’s function is 
to be calculated for a point (x, y) whose jc- coordinate is far away from the singular 
point (r’’ y )， then the appropriate expansion is obtained by letting Li = d 2 /dx 2 . 


22.6 Problems 

22.1. Find the GF for the Dirichlet BVP in two dimensions if D is the UHP and 
dD is the x-axis. 



642 22. MULTIDIMENSIONAL GREEN，S FUNCTIONS: APPLICATIONS 


22.2. Add /(r") to H(r, r f/ ) in Example 22.1.2 and retrace the argument given 
there to show that f(r ,f ) = 0. 

22.3. Use the method of images to find the GF for the Laplacian in the exterior 
region of a “sphere” of radius a in two and three dimensions. 

22.4. Derive Equation (22.7) from Equation (22.6). 

22.5. Using Equation (22.7) with /? = 0, show that if g(0\ (p r ) = Vo» the potential 
at any point inside the sphere is Vo- 

22.6* Find the BC that the GF must satisfy in order for the solution u to be rep¬ 
resentable in terms of the GF when the BC on u is mixed, as in Equation (22.10). 
Assume a self-adjoint SOLPDO of the elliptic type, and consider the two cases 
a ⑻ 0 and 卢 (x) # 0 for x e dD. Hint: In each case, divide the mixed BC 

equation by the nonzero coefficient, substitute the result in the Green’s identity, 
and set the coefficient of the m term in the dD integral equal to zero. 

22.7. Show that the diffusion operator satisfies 

L x ， fG(x, y; r - r) = S(x - y)S(t - r). 


Hint: Use 

30 

~r) = 8(t - t ). 

22.8. Show that for m = 3 the expression for G s (r) given by Equation (22.36) 
reduces to G s (r) = —e 一 〆 r /(4jzT). 

22.9. The time-independent Schrodinger equation can be rewritten as 
(V 2 +* 2 )vI/-^V(r)^=0, 

where k 2 = 2fiE/h 2 and /x is the mass of the particle. 

(a) Use techniques of Section 21.4 to write an integral equation for 屮 . 

(b) Show that the Neumann series solution of the integral equation converges only 
if 


r R 3 


\V(r)\ 2 d 3 


r < 


27th 4 lmk 

M 2 


⑹ Assume that the potential is of Yukawa type: F(r) = g 1 e~ fcr /r. Find a con¬ 
dition between the (bound state) energy and the potential strength g that ensures 
convergence of the Neumann series. 


22.10* Derive Equation (22.29). 




22.6 PROBLEMS 643 


22,11, Derive Equation (22.31) using the procedure outlined for parabolic equa¬ 
tions. 

22A2, (a) Show that the GF for the Helmholtz operator V 2 +/a 2 in two dimensions 
is 

■ 

G(r, rO = —- r ，|) + //(r ， 〆 )， 

where H(r, r f ) satisfies the homogeneous Helmholtz equation. 

(b) Separate the variables and use the fact that H is regular at r = r' to show that 
H can be written as 

00 

H(r, r f ) = ^2 J n(f^r)[a n (r f ) cos nO + b n (r f ) sinn 沒 ]. 
n =0 

(c) Now assume a circular boundary of radius a and the BC G(a, r') = 0, in which 
a is a vector from the origin to the circular boundary. Using this BC, show that 

■ p2jv 

ao(r f ) = - — — ~~ - / -\-r a - 2 ar r -O f )) dO, 

SnJo(^a) J 0 u 

_ 27T 

a n (r r ) = --- f {fJiy/a 2 + — 2 ar f cos (0 — O f )) cos nO dO , 

A7tJ n {iia) Jq u 

b n (r^) = - / {fjbyja 2 + r f2 — 2 ar f cos(0 — 沒 ’) ） sinn@ dG. 

47 tJ n (fia) Jo u 

These equations completely determine H(r, r f ) and therefore G(r, r'). 

22.13. Use the Fourier transform technique to find the singular part of the GF for 
the diffusion equation in one and three dimensions. Compare your results with that 
obtained in Section 22.4.3. 

22.14. Show directly that both G 产 ） and G$ adv ) satisfy V 2 G = 5(r)5(/) in three 
dimensions. 

• 

22.15. Consider a rectangular box with sides a, b, andc located in the first octant 
with one comer at the origin. Let D denote the inside of this box. 

(a) Show that zero cannot be an eigenvalue of the Laplacian operator with the 
Dirichlet BCs on BD, 

(b) Find the GF for this Dirichlet B VP. 

22.16. Find the GF for the Helmholtz equation (V 2 + k 2 )u = 0 on the rectangle 
0 < x < a 9 0 < y < b. 

22.17. Find the singular part of the one-dimensional GF for the operator 
ad 2 /dx 2 -\-b, where a >0 and b < 0 . 



MULTIDIMENSIONAL GREEKS 卩 UNCTIONS: APPLICATIONS 


22.18. Calculate the GF of the two-dimensional Laplacian operator appropriate 
for Neumann BCsonthe rectangle 0 <x <a,0 <y <b. 

22.19. Find the three-dimensional DirichletGF for the Helmholtz operator V 2 -k 2 
in the half-space z > 0. 

22.20. Find the three-dimensional Neumann GF for the Helmholtz operator V 2 - 
k 2 in the half-space z < 0. 

22.21. Using the integral form of the Schrodinger equation in three dimensions, 
show that an attractive delta potential V (r) = — Vo6(r — a) does not have a bound 
state (E < 0). Contrast this with the result of Example 21.4.1. 

22.22. By taking the Fourier transform of both sides of the integral form of the 
Schrodinger equation, show that for bound-state problems (E < 0), the equation 
in “momentum space” can be written as 


7 , 、 2 卩 ( 1 
^ (P) = ~(27r)V2fi2 

where k 2 = —IfiE/h 2 , 


)J V(p-q)ir(q)d 3 q, 


22,23. Write the bound-state Schrodinger integral equation for a non-local poten¬ 
tial, noting that G(r, r r ) = e- K ^/\r -r f [, where k 2 = -2/i^ 2 and/x isthe 
mass of the bound particle. The homogeneous solution is zero, as is always the 
case with bound states. 

⑻ Assuming that the potential is of the form V(r ， 〆 ） = —g 2 U(r)U (rO, show 


that a solution to the Schrodinger equation exists iff 


^g 2 



d 3 r f 


e 


一 /c|r_r'| 


|r 一 rq 


t/(r)t/(r / ) = l. 


(22.62) 


(b) Taking U(r) = e~ ar /r, show that the condition in (22.62) becomes 


4^/xg : 

aft 2 


1 i — 

L(a + /c ) 2 J 


(c) Since /c > 0, prove that the equation in (b) has a unique solution only if 


g 2 > 


E 


h 2 a 2 /(4nfi),m which case the bound-state energy is 

■2 、 I" 4 


h 2 




47T/Xg j 
ah 2 


a 


22.24. Repeat calculations in Sections 22.4.1 and 22.4.2 for m = 2. 

22.25. In this problem, the dimension m is three. 

⑻ Derive the following identities: 




宇 K) 



22.6 PROBLEMS 645 


—= 25 ⑴， ± r) = 8 /f (t 士 r) 士 —S f (t dz r) 9 

dt r 

where €{t) = 9(t) — 0(—t), 

(b) Use the results of ⑻ to show that the GF [Equation (22.47)] derived from 
the principal value of the co integration for the wave equation in three dimensions 
satisfies only the homogeneous PDE. Hint: Use V 2 (l/r) = 4jr5(r). 

22.26. Calculate the retarded GF for the wave operator in two dimensions and 
show that it is equal to G^ et )(r ， 0 = 0 {t)/{2n\/t 2 — r 2 ). Now use this result to 
obtain the GF for any even number of dimensions: 


0 ret) (M) = 


0(t) , 一 ^- 1 I 1 
2 tt V 2 nr dr) . — r 2 - 


for n = m/ 2 . 


22*27. (a) Find the singular part of the retarded GF and the advanced GF for the 
wave equation in three dimensions using Equations (22.48) and (22.49). Hint: 
= y/2/jtkr sinkr. 

(b) Use ⑻ and Equation (22.51) to show that 

]^2 + ( 丄 + 6) 2] - [r 2 + ( - r + 6)2] ) = ^r)-S(t-r)]. 


22.28. Show that the eigenfunction expansion of the GF for tlie Dirichlet B VP for 
the Laplacian operator in two dimensions for which the region of interest is the 
interior of a circle of radius a is 


G(r,r f ) 


2 € n J n J n a x nm) COS ft ((p — <p f ) 


71 


n=0m=\ 




where € 0=5 and = 1 forn > 1, and use has been made of Problem 14.39. 

22.29. (a) Complete the calculations of Example 22.5.4, and 

(b) find the GF for the Laplacian with Dirichlet BCs on two concentric spheres of 
radii a and 办 ， with a < b. 

(c) Consider the case where a — 0 and b 00 and compare the result with the 
singular part of the GF for the Laplacian. 

22.30. Solve the Dirichlet BVP for the operator V 2 — A : 2 in the region 0 <x <a, 
0 < y <b, —00 < z < 00 . Hint: Separate the operator into Li and L 2 . 

22.31. Solve the problem of Example 22.5.1 using the separation of operator 
technique and show that the two results are equivalent. 

22.32. Use the operator separation technique to calculate the Dirichlet GF for the 
two-dimensional operator V 2 — k 2 on the rectangle 0<x<a,0<j<^. Also 
obtain an eigenfunction expansion for this GF. 




646 22. WIULTIDIWIENSIONALGREEN^ FUNCTIONS; APPLICATIONS 


22.33. Use the operator separation technique to find the three-dimensional Dirich- 
let GF for the Laplacian in a circular cylinder of radius a and height h. 

22.34. Calculate the singular part of the GF for the three-dimensional free 
Schrodinger operator 


ih— - —V 2 . 


a 

dt 


2\i 


22.35. Use the operator separation technique to show that 

(a) the GF for the Helmholtz operator V 2 + fe 2 in three dimensions is 

00 l 

= MkrMkr^YUO, ip)Y^ m (O f 9 ^), 

1=0 m=—l 

where r< (r>) is the smaller (larger) of r and 〆 and ji and hi are the spherical 
Bessel and Hankel functions, respectively. No explicit BCs are assumed except 
that there is regularity at r = 0 and that G(r, r f ) 0 for |r| oo. 

(b) Obtain the identity 


47T|r — 〆 


=ik m j 伽 < 

1=0 m=—l 


)h l (kr > )Y lm (0 f cp)Y l l l (0 / 9 (p / ). 


(c) Derive the plane wave expansion [see Equation (19.26)] 
00 l 

1=0 m=—l 


where 0’ and 〆 are assumed to be the angular coordinates of k. Hint: Let | r’ | ► oo, 

and use 

|r - r'| = (r’ 2 + r 2 — 2r * r’) 1 ’ 2 — r f — ^ 

r 

and the asymptotic formula h^\z) (1 /z)e l valid for large z- 

Additional Reading 


1. Economou, E. Greenes Functions in Quantum Physics, Springer-Verlag, 
1983. Emphasizes applications of Green’s function in quantum mechanics, 
especially solid-state physics. 

2. Jackson, J. Classical Electrodynamics, 2nd ed., Wiley, 1975. Many exam¬ 
ples of Green’s function techniques used in electromagnetism, in particular 
the advanced and retarded Green’s functions as used in electromagnetic 
radiation theory. 




22.6 PROBLEMS 647 


3. Roach, G. Green’s Functions, Van Nostrand, 1970. Treats eigenfunction 
expansion of Green’s functions and gives many examples. 





Part VII_ 

Groups and Manifolds 




23 _ 

Group Theory 


The tale of mathematics and physics has been one of love and hate, of harmony 
and discord, and of friendship and animosity. From their simultaneous inception 
in the shape of calculus in the seventeenth century, through an intense and inter¬ 
active development in the eighteenth and most of the nineteenth century, to an 
estrangement in the latter part of the nineteenth and the beginning of the twenti¬ 
eth century, mathematics and physics have experienced the best of times and the 
worst of times. Sometimes, as in the case of calculus, nature dictates a mathemat¬ 
ical dialect in which the narrative of physics is to be spoken. Other times, man, 
building upon that dialect，develops a sophisticated language in which—as in the 
case of Lagrangian and Hamiltonian interpretation of dynamics — the narrative of 
physics is set in the most beautiful poetry. But the happiest courtship, and the 
most exhilarating relationship, takes place when a discovery in physics leads to a 
development in mathematics that in turn feeds back into a better understanding of 
physics, leading to new ideas or a new interpretation of existing ideas. Such a state 
of affairs began in the 1930s with the advent of quantum mechanics, and, after a 
lull of about 30 years, revived in the late 1960s. We are fortunate to be witnesses 
to one of the most productive collaborations between the physics and mathematics 
communities in the history of both. 

It is notan exaggeration to say that the single most important catalyst that has - 
facilitated this collaboration is the idea of symmetry the study of which is the main 
topic of the theory of groups, the subject of this chapter. Although group theory, 
in one form or another, was known to mathematicians as early as the beginning 
of the nineteenth century, it found its way into physics only after the invention of 
quantum theory, and in particular, Dirac’s interpretation of it in the language of 


652 23. GROUP THEORY 


transformation theory. Eugene Wigner, in his seminal paper 1 of 1939 in which he 
applied group theoretical ideas to Lorentz transformations，paved the way for the 
marriage of group theory and quantum mechanics. Today, in every application of 
quantum theory, be it to atoms, molecules, solids, or elementary particles such as 
quarks and leptons, group-theoretical techniques are indispensable. 

23.1 Groups 

The prototype of a group is a transformation group, the set of invertible mappings 
of a set onto itself. Let us elaborate on this. First, we take mappings because they are 
the most general operations performed between sets. From a physical standpoint, 
mappings are essential in understanding the symmetries and other basic properties 
of a theory. For instance, rotations and translations are mappings of space. Second, 
the mappings ought to be on a single set, because we want to be able to compose any 
given two mappings. We cannot compose f : A B and g : A B, because, 
by necessity, the domain of the second must be a subset of the image of the first. 

With three sets, and A ^ B, B C, even if the composition f o g is defined, 
go f will not be. Third, we want to be able to undo the mapping. Physically, this 
means that we should be able to retrace our path to our original position in the set. 
This can happen only if all mappings of interest have an inverse. Finally, we note 
that composing a mapping with its inverse yields identity. Therefore, the identity 
map must also be included in the set of mappings. 

We shall come back to transformation groups frequently. In fact, almost all 
groups considered in this book are transformation groups. However, as in our 
study of vector spaces in Chapter 1， it is convenient to give a general description 
of (abstract) groups. 

Gro 叩 defined 23.1.1. Definition. A group is a set G together with an associative binary oper¬ 
ation G x G G called multiplication ~and denoted generically by ★ 一 ~ having 

the following properties: 

L There exists a unique element 2 e e G called the identity such that e*g = 
8*e = g. 

2. For every element g 它 G ， there exists an element g^ 1 , called the inverse of 
g，such that g ★ g 一 1 = 茗一 1 * g = e. 

To emphasize the binary operation of a group, we designate it as (G, ★). 

order of a gro 叩 If the underlying set G has a finite number of elements, the group is called 
finite, and its number of elements, denoted by |G|, is called the order of G. We 
can also have an infinite group whose cardinality can be countable or continuous. 


jE. P. Wigner, “On the Unitary Representations of the Inhomogeneous Lorentz Group ,of Math. 40 (1939) 149-204. 
2 To distinguish between identities of different groups, we sometimes write for the identity of the group G. 




23.1 GROUPS 653 


Given an element a € G, we write 

t —m 一 1 — 1 

a = a ★ • ■ ■ a = a *a * • ■ • 

' - v - J S - v - 

k times m times 

and note that 


a 1 



for all i, j € Z. 


Evariste Galois (1811-1832) was definitely not the stereo- 
typically dull mathematician, quietly creating theorems and 
teaching students. He was a political firebrand whose life 
ended in a mysterious duel when he was only 21 years old. 

An ardent republican, he was in the unfortunate position 
of having Cauchy, an ardent royalist，as the only French 
mathematician capable of understanding the significance 
of his work. His professional accomplishments (fewer than 
100 pages, much of which was published posthumously) 
received the attention they deserved many years later. It is 
truly sad to realize that for decades, work from the man 
credited with the foundation of group theory were lost to the world of mathematics. Ga¬ 
lois's early years were relatively happy. His father, a liberal thinker known for his wit, 
was director of a boarding school and later mayor of Bourg-la-Reine. Galois’s mother took 
charge of his early education. A stubborn, eccentric woman, she mixed classical culture with 
a fairly stem religious upbringing. The young Galois entered the College Louis-le-Grand 
in 1823, but found the harsh discipline imposed by church and political authorities difficult 
to bear. His interest in mathematics was sparked in class by Vernier, but Galois quickly tired 
of the elementary character of the material, preferring instead to read the more advanced 
original works on his own. After a flawed attempt to solve the general fifth-order equation, 
Galois submitted a paper to the Acadimie des Sciences in which he described the definitive 
solution with the aid of group theory, of which the young Galois can be considered the cre¬ 
ator. However, this strong initial foray into the frontiers of mathematics was accompanied 
by tragedy and setback. A few weeks after the paper’s submission，his father committed 
suicide，which Galois felt was largely to be blamed on those who politically persecuted his 
father. A month later the young mathematician failed the entrance examination to the Ecole 
Polytechnique, largely due to his refusal to answer in the form demanded by the examiner, 
Galois did gain entrance to a less prestigious school for the preparation of secondary-school 
teachers. While there he read some of Abel ’s results (published after Abel’s death) and found 
that they contained some of the results he had submitted to the Academy including the proof 
of the impossibility of solving quintics. Cauchy, assigned as the judge for Galois’s paper, 
suggested that he revise it in light of this new information. Galois instead wrote an entirely 
new manuscript and submitted it in competition for the grand prix in mathematics. Tragi¬ 
cally, the manuscript was lost on the death of Fourier, who had been assigned to examine 
it, leaving Galois out of the competition. These events, fueled by a later，unfair dismissal of 
another of his papers by Poisson, seem to have driven Galois toward political radicalism and 
rebellion during the renewed turmoil then plaguing France. He was arrested several times 







654 23. GROUP THEORY 


for his agitations, although he continued work on mathematics while in custody. On May 
30, 1832, he was wounded in a duel with an unknown adversary, the duel perhaps caused 
by an unhappy love affair. His funeral three days later sparked riots that raged through Paris 
in the days that followed. 

The delay in recognition of the true scope of Galois’s scant but amazing work stemmed 
partly from the originality of his ideas and the lack of competent local reviewers. Cauchy left 
France after seeing only the early parts of Galois’s work, and much of the rest remained un¬ 
noticed until Liouville prepared the later manuscripts for publication a decade after Galois’s 
death. Their true value wasn’t appreciated for another two decades. The young mathemati¬ 
cian himself added to the difficulty by deliberately making his writing so terse that the 
“established scientists” for whom he had so much disdain could not understand it. Those 
fortunate enough to appreciate Galois’s work found fertile ground in mathematical research, 
in such fundamental fields as group theory and modem algebra, for decades to come. 


23.1.2. Example. The following are examples of familiar sets that have group properties. 

(a) The set Z of integers under the binary operation of addition forms a group whose identity 
element is 0 and the inverse of n is —n. This group is countably infinite. 

(b) The set {—1, +1}, under the binary operation of multiplication, forms a group whose 
identity element is 1 and the inverse of each element is itself. This group is finite. 

(c) The set {—1, +1, —i, -[-«}» under the binary operation of multiplication, forms a finite 
group whose identity element is 1. 

(d) The set 1R，under the binary operation of addition, forms a group whose identity element 
is 0 and the inverse of r is —r. This group is uncountably infinite. 

(e) The set E + (Q + ) of positive real (rational) numbers, under the binary operation of 
multiplication, forms a group whose identity element is 1 and the inverse of r is l/r\ This 
group is uncountably (countably) infinite. 

(f) The set C ， under the binary operation of addition，forms a group whose identity element 
is 0 and the inverse of z is —z. This group is uncountably infinite. 

(g) The uncountably infinite set C — {0} of all complex numbers except 0, under the binary 
operation of multiplication, forms a group whose identity element is 1 and the inverse of z 
is l/z. 

(h) The uncountably infinite set V of vectors in a vector space, under the binary operation 
of addition, forms a group whose identity element is the zero vector and the inverse of \a) 
is — \a). 

(i) The set of invertible n xn matrices, under the binary operation of multiplication, fonns 
汪 group whose identity element is the n x w unit matrix and the inverse of A is A -1 . This 
group is uncountably infinite. 

The reader is urged to verify that each set given above is indeed a group. 園 

In general, the elements of a group do not commute. Those groups whose 
elements do commute are so important that we give them a special name: 

abelian groups 23.1.3. Definition. A group (G, ★) is called abelian or commutative ifa^b = bM 
defined for all a,b e G. It is common to denote the binary operation of an abelian group 
by+. 

All groups of Example 23.1.2 are abelian except the last. 



23.1 GROUPS 655 


23.1.4. Example* Let A be a vector potential that gives rise to a magnetic field B. The 
set of transformations of A that give rise to the same B is an abelian group. In fact, such 
transformations simply add the gradient of a function to A. The reader can check the 
details. M 


symmetric or 
permutation group 


homomorphism, 
isomorphism, and 
automorphism 


trivial 

homomorphism 


genera! linear group 


The reader may also verify that the set of invertible mappings f : S S 9 
i.e., the set of transformations of S, is indeed a (nonabelian) group. If 5 has n 
elements, this group is denoted by S n and is called the symmetric group of S. S n 
is a nonabelian (unless n < 2) finite group that has /i! elements. An element g of 
S n is usually denoted by two rows, the top row being S itself~usually taken to be 
1 ， 2, • • ■ ， 《 _and the bottom row its image under g. For example, ^ = ( 234 ^) is 
an element of S 4 such that g(l) = 2, ^(2) = 3, g(3) = 4, and 客 (4) = 1. 

Consider two groups, the set of vectors in a plane ((x, y), +) and the set of 
complex numbers (C, +), both under addition. Although these are two different 
groups, the difference is superficial. We have seen similar differences in disguise 
in the context of vector spaces and the notion of isomorphism. The same notion 
applies to group theory: 

23.1.5. Definition. Let (G,*) and (H, ®) be groups. A homomorphism / : G —* 
H is a map such that 

/(a ★ 办 ） = /⑷ o f(b) V a ， 办 G G. 

An isomorphism is a homomorphism that is also a bijection. Two groups are 
isomorphic, denoted by G ^ H t if there is an isomorphism f : G ^ H. An 
isomorphism of a group onto itself is called an automorphism. 

An immediate consequence of this definition is that f(en) = e/f and = 

[/(g )]— 1 (see Problem 23.9). 

23.1«6« Example. Let G be any group and {1} the multiplicative group consisting of the 
single number 1 ■ It is straightforward to show that f : G ^ {1}, given by (the only function 
available!) /(g) = 1 for all g e G is a homomorphism. This homomorphism is called the 
trivial (or sometimes, symmetric) homomorphism. M 

The establishment of isomorphism / : M 2 -> C between ((x, y), +)，and 
(C, +) is trivial: Just write f(x, j) = x + iy. A less trivial isomorphism is the 
exponential map, exp : (R，+) —► (M+，■). The reader may verify that this is a 
homomorphism (in particular, it maps addition to multiplication) and that it is 
one-to-one. We have noted that the set of invertible maps of a set forms a group. 
A very important special case of this is when the set is a vector space V and the 
maps are all linear 


23.1,7. Box. The general linear group of a vector space V, denoted by 
GL(V) f is the set of all invertible endomorphisms ofV. In particular, when 
V = C n , we usually write GL(n, C) instead of GL(C n ) with similar notation 
for M. 





656 23. GROUP THEORY 


group multiplication 
table 


subgroup defined 


trivial subgroup 


special linear group 


unitary, orthogonal, 
special unitary, and 
special orthogonal 
groups 


It is sometimes convenient to display a finite group G = {gi asa |G| x |G| 
table, called the group multiplication table, in which the intersection of the ith 
row and 7 th column is occupied by gi ir gj. Because of its trivial multiplication, 
the identity is usually omitted from the table. 

23.2 Subgroups 

It is customary to write ab instead of We shall adhere to this convention, but 
restore the ★ as necessary to avoid any possible confusion. 

23.2.1. Definition. A subset S of a group G is a subgroup ofG if it is a group in 
its own right under the binary operation ofG, Le” if it contains the inverse of all 
its elements as well as the product of any pair of its elements. 

It follows from this definition that e e S. It is also easy to show that the 
intersection of two subgroups is a subgroup (Problem 23.2). 

23.2.2. Example. Examples of subgroups 

1. For any G, the subset {e} 7 consisting of the identity alone, is a subgroup of G called 
the trivial subgroup of G. 

2. (Z, +) is a subgroup of (M, +). 

3. The set of even integers (but not odd integers) is a subgroup of (Z, +). In fact, the 
set of all multiples of a positive integer m, denoted by Zm 7 is a subgroup of It 
turns out that all subgroups of Z are of this form. 

4. The subset of GL(n, €) consisting of transformations that have unit determinant 
is a subgroup of GL(n t C) because the inverse of a transformation with unit deter¬ 
minant also has unit determinant, and the product of two transformations with unit 
determinants has unit determinant. 


23.2.3. Box. The subgroup of GL(n, €) consisting of elements having unit 
determinant is denoted by SL(n, €) and is called the special linear group. 


5. The set of unitary transformations of € rt , denoted by U(n), is a subgroup of 
GL(n, C) because the inverse of a unitary transformation is also unitary and the 
product of two unitary transformations is unitary. 


23.2.4. Box. The set of unitary transformations U(n) is a subgroup of 
GL(n, €) and is called the unitary group. Similarly, the set of orthogonal 
transformations ofR n is a subgroup ofGL(n, W).Itis denoted by 0(n) and 
called the orthogonal group. 




23.2 SUBGROUPS 657 


Lorentz and Poincare 
groups 

symplectic group 


conjugate 如 bgro 叩 


Each of these groups has a special subgroup whose dements have unit determinants. 
These are denoted by SU(n) and SO{n )， and called special unitary group and 
special orthogonal group, respectively. The latter is also called the group of rigid 
rotations of 

6. Let x, y e M. n , and define an inner product on R n by 

x * y = ^ x \yi - x p y p Xp^iyp^i + … -\-x n y n . 

Denote the subset of GL(n y R) that leaves this inner product invariant by 3 0{p,n — 
/?). Then 0(p, n — /?) is a subgroup of GL(«» R). The set of linear transformations 
among 0(p, n — p) that have determinant 1 is denoted by SO{p, n — p). The 
special case of p = 0 gives us the orthogonal and special orthogonal groups. 4 When 
n = 4 and p = 3 t we get the inner product of the special theory of relativity, and 
0(3,1), the set of Lorentz transformations, is called the Lorentz group. If one adds 
translations ofM 4 to 0(3,1), one obtains the Poincare group, P(3,1). 

7. Let x, y g and J the 2 k x In matrix (_^ J), where 1 is the « x w unit matrix. 
The subset of GL(2n y E) that leaves x f Jx, called an antisymmetric bilinear form, 
invariant is a subgroup of GL(2n t W) called the symplectic group and denoted by 
Sp(2n f R). As we shall see in Chapter 26, the symplectic group is fundamental in 
the formal treatment of Hamiltonian mechanics. 

8. Let 5 be a subgroup of G and g e G. Then it is readily shown that the set 

g- l Sg = {g- l sg\seS} 


is also a subgroup of G, called the subgroup conjugate to jS 1 under g, or the subgroup 
g-conjugate to S. ■ 


When discussing vector spaces, we noted that given any subs 打 of a vector 
space, one could construct a suhspace out of it by taking all possible linear com- 
binations (natural operations of the vector space) of the vectors in the subset. We 
called the subspace thus obtained the span of the subset The same procedure is 
applicable in group theory as well. If 5 is a subset of a group G, we can generate 
a subgroup out of S by collecting all possible products and inverses (natural oper¬ 
ations of the group) of the elements of S. The reader may verify that the result is 
indeed a subgroup of G. 

subgroup generated 23,2_5_ Definition. Let S be a subset of a group G. The subgroup generated by 
by a subset s, denoted by {S) t is the union of S and all inverses and products of the elements 
ofS. 


cyclic subgro 叩 In the special case for which S = {a}, a single element, we use {a) instead of 
({a}> and call it the cyclic subgroup generated by a. It is simply the collection of 
all integer powers of a. 

commutator of group 23.2.6. Definition. Let G be a group and a,b e G. The commutator of a and b, 
elements 


3 The reader is warned that what we have denoted by 0(p,n — p) is sometimes denoted by other authors by 0(n — p, p) or 
0(n t p) or 0(p, n). 

4 It is customary to write 0(n) and SO(n) for 0(0, n) and 50(0,«) 



658 23. GROUP THEORY 


commutator 
subgroup of a group 


centralizer of an 
element in G and the 
center of G 


kerne! of a 
homomorphism 


left and right cosets 


denoted by [a, b], is 
[a, b] = aba~ l b~ l . 

The subgroup {[J a b]} generated by all commutators of G is called the 
commutator subgroup of G. The reader may verify that a group is abelian if and 
only if its commutator subgroup is the trivial subgroup, i.e” consists of only the 
identity element. 

23.2.7. Definition. Let x € G. The set of elements of g that commute with x, 
denoted by Cg(x), is called the centralizer of x in G. The set Z(G) of elements 
of a group G that commute with all elements of G is called the center of G. 

23.2.8. Theorem. Cq(x) is a subgroup ofG and Z(G) is an abelian subgroup of 

G. Furthermore, G is abelian if and only ifZ(G) = G. 

Proof. Proof is immediate from the definitions. □ 

23.2.9. Definition. Let G and H be groups and let f •• G — H be a homomor¬ 
phism. Define the kernel of f by 

ker / = {x € G I f(x) = e e H}. 

The reader may check that ker / is a subgroup of G, and / (⑺ is a subgroup of 

H. These are the analogues of the same concepts encountered in vector spaces. In 
fact, if we treat a vector space as an additive group, with the zero vector as identity, 
then the above definition coincides with that of linear mappings and vector spaces. 

Carrying the analogy further, we recall that given two subspaces U and TV of 
a vector space V, we denote by U + W all vectors of V that can be written as the 
sum of a vector in U and a vector in W. There is a similar concept in group theory 
that is sometimes very useful. 

23.2.10. Definition. Let S and T be subsets of a group (G, *). Then one defines 
the product of these subsets as 

S-kT = {s-kt\seS and t e T}. 

In particular，ifT consists of a single element t ， then 

S ★ t = {*s ★ f I G 5}. 

As usual，we shall drop the ★ and write ST and St. If S is a subgroup，then St is 
called a right cosef of S in G. Similarly, tS is called a left coset of S in G. In 
either case, t is said to represent the coset. 


5 Some authors switch our right and left in their definition. 





23.2 SUBGROUPS 659 


23.2.11. Example. Let G = B? treated as an additive abelian group，and let Shea, plane 

through the origin. Then t + S is S if t e S (see Problem 23.5); otherwise, it is a plane 
parallel to 5. In fact , 才 + S is simply the translation of all points of Sbyt. M 

23.2.12. Theorem. Any two right (left) cosets of a subgroup are either disjoint or 
identical. 

Proof. Let 5 be a subgroup of G and suppose that x G Sail Sb. Then x = s\a = 
S 2 b with s\ t S 2 e S. Hence, a 办一 1 = ^ S. By Problem 23.6, Sa == Sb. The 

left cosets can be treated in the same way. □ 

A more “elegant” proof starts by showing that an equivalence relation can be 
defined by 

axb •<=>■ ab 一 1 e S 

and then proving that the equivalence classes of this relation are cosets of S. 

One interpretation of Theorem 23.2.12 is that a and 办 belong to the same right 
coset of S if and only if ab~ x e 5. A second interpretation is that a coset can be 
represented by any one of its elements (why?). 

All cosets (right or left) of a subgroup S have the same cardinality as 5 itself. 
This can readily be established by considering the map 伞， • S — Sa (<(> •• S aS) 
with 沴 (j)= 如 = as) and showing that <p is bijective. 

There are many instances both in physics and in mathematics in which a col¬ 
lection of points of a given set represent a single quantity. For example, it is not 
simply the set of ratios of integers that comprise the set of rational numbers, but 
the set of certain collections of such ratios: The rational number | represents 妻， |， 
I ， etc. Similarly, a given magnetic field represents an infinitude of vector poten¬ 
tials each differing by a gradient from the others, and a physical state in quantum 
mechanics is an infinite number of wave functions differing from one another by 
a phase. 

With the set of cosets constructed above, it is natural to ask whether they could 
be given an algebraic structure. The most natural such structure would clearly be 
that of a group: Given aS and bS define their product as abS. Would this operation 
turn the set of (left) cosets into a group? The following argument shows that it 
will, under an important restriction. It is clear that the identity of such a group 
would be S itself. It is equally clear that we should have (/? 一 1 S)(bS) = S, so that 
(b~ x Sb)S = S. It follows from Problem 23.5 that we must have b~ l Sb c S for 
all b e G. Now replace b with b~ l and note that bSb -' C 5 as well. Let 5 be an 
arbitrary element of 5". Then for some e ^andi 1 — b~ l s f b e b~ l Sb. 

It follows that S c b - 1 Sb for all 办 € G ， and, with the reverse inclusion derived 
above, that S = b~ x Sb, This motivates the following definition. 

normal subgro 叩 23.2.13. Definition. A subgroup N of a group G is called normal ifN = g- l Ng 
defined (equivalently ifNg = gN} for all g € G. 


660 23. GROUP THEORY 


The preceding argument shows that the set of cosets (no specification is neces¬ 
sary since the right and left cosets coincide) of a normal subgroup forms a group: 

quotient gro 叩 23.2.14. Theorem. If N is a normal subgroup of G, then the collection of all 
cosets ofN, denoted by G/N, is a group, called the quotient group of G by N. 

We note that all subgroups of an abelian group are automatically normal, and 
that the only subgroup conjugate to a normal subgroup N i&N itself (see Example 
23.2.2). 

23.2.15. Example. Let G = E 3 and let 5 be a plane through the origin as in Example 
23.2.11. Since G is abelian, S is automatically normal, and G/S is the set of planes parallel 
to S. Let e w be a normal to S. Then it is readily seen that 

G/S = {re n + 51 r € E}. 

We have picked the perpendicular distance between a plane and S (with sign included) to 
represent that plane. The reader may check that the quotient group G/S is isomorphic to 
E. Identifying S with ]R 2 , we can write E^/M 2 = R. The cancellation of exponents is quite 
accidental here! 

Let G = % and S = Zm, the set of multiples of the positive integer m. Since Z is 
abelian, Zm is normal, and is indeed a group, a typical element of which looks like 

k + m%. By adding (or subtracting) multiples of m to k t and using mj + mh = mZ (see 
Problem 23.5), we can assume that 0 < k <m — l.lt follows that Z/Zm is a finite group. 
Furthermore ， 

( 灸 1 + mZ) + (/：2 + mZ) = h = k-\- mZ, 

where k is the remainder after enough multiples of m have been subtracted from k\ + 
One writes k\-\-k 2 = k (mod m). The coset k + mZ is sometimes denoted by k and the 
quotient group Z/Zm by Z m : 

X m _ = { 0,1 ， 2,…， m — 1 }• 

'Em is a prototype of the finite cyclic groups. It can be shown that every cyclic group of 
order m is isomorphic to a generator of which is 1 (recall that the binary operation is 
addition for Z m ). ■ 

first isomorphism 23.2.16. Theorem, (first isomorphism theorem) Let G and H be groups and f : 
theorem G ^ H a homomorphism. Then ker / is a normal subgroup ofG，and G/ker / 
is isomorphic to /(G). 

Proof. We have already seen that ker / is a subgroup of G. To show that it is 
normal, let g e G and x € ker f. Then 

f(8xg~ X ) = f(g)f(x)f(g~ l ) = /(gW(g _1 ) = f(g)f(g -1 ) 

=f(88~ l ) = f(^G) = e H . 






23.2 SUBGROUPS 661 


It follows that gxg~ x € ker /_ Therefore, ker / is normal. We leave it to the reader 
to show that (j > : G/ker / f(G) given by <j> (g[ker/]) = 0([ker f]g) = /(g) 
is an isomorphism. 6 □ 

23.2.17. Example. The special linear group of V is a normal subgroup of the general 
linear group of V. To see this, note that det : GL(V) is a homomorphism whose 

kernel is SL(V). 嚙 

conjugate and 23.2.18* Definition. Letx g G.A conjugate of x is an element y ofG that can be 
coniugacy class written as y = gxg~ l with g e G. The set of all elements ofG conjugate to one 
defined another is called a conjugacy class. The ith conjugacy class is denoted by Ki ， 

One can check that “x is conjugate to y” is an equivalence relation whose 
classes are the conjugacy classes. In particular, two different conjugacy classes are 
disjoint. One can also show that each element of the center of a group constitutes 
a class by itself. In particular, the identity in any group is in a class by itself, and 
each element of an abelian group forms a (different) class. 

Although a normal subgroup N contains the conjugate of each of its elements, 
N is not a class. The class containing any given element of iV will be only a proper 
subset of N (unless N is trivial). The characteristic feature of a normal subgroup 
is that it contains the conjugacy classes of all its elements. This is not shared by 
other subgroups, which, in general, contain only the trivial class of the identity 
element. 


23.2.19. Example. Consider the group SO(3) of rotations in three dimensions. Let us 
denote a rotation by R 爸 (G )，where e is the direction of the axis of rotation and ^ is the angle 
of rotation. A typical member of the conjugacy class of R^(0) is RR 备 (<9) 只 -\ where R is 
some rotation. Let = Re be the vector obtained by applying the rotation R one, and note 
that 

RR^(9)R- l e f ^ RR^(0)R~ 1 Re = RR^e = Re = 


all rotations having 
the same angle 
belong to the same 
conjugacy class 


where we used the fact that = e because a rotation leaves its axis unchanged. This last 
statement, applied to the equation above, also shows that RR^(0)R~^ is a rotation about e 7 . 
Problem 23.18 establishes that the angle of rotation associated with RR^(0)R~^ is 0. We 
can summarize this as RR^O)^ 1 = R^(0). It follows that all rotations having the same 
angle belong to the same conjugacy class of the group of rotations in three dimensions. 團 


23.2.1 Direct Products 

The resolution of a vector space into a direct sum of subspaces was a useful tool in 
revealing its structure. The same idea can also be helpful in studying groups. Recall 
that the only vector common to the subspaces of a direct sum is the zero vector. 
Moreover, any vector of the whole space can be written as the sum of vectors taken 

6 Compare this theorem with the set-theoretic result obtained in Chapter 0 where the map X/ ><-*■ f (X) was shown to be 
bijective if tx is the equivalence relation induced by /. 


662 23. GROUP THEORY 


from the subspaces of the direct sum. Considering a vector space as a (abelian) 
group, with zero as the identity and summation as the group operation, leads to 
the notion of direct product. 

internal direct 23.2.20. Definition. A group G is said to be the direct product of two of its sub- 
product of gro 叩 s groups H\ and H 2 , and we write G = H\ 乂 H 2 , if 

1. all elements of H\ commute with all elements of H 2 ； 

2. the group identity is the only element common to both H\ and H 2 ； 

3. every g e G can be written as g ^ h\h 2 withh\ e H\ andh^ 6 /fe¬ 
lt follows from this definition that h\ and ^2 are unique, and H\ and H 2 are 

normal. This kind of direct product is sometimes called internal, because the 
“factors” Hi and H 2 are chosen from inside the group G itself. The external direct 
product results when we take two unrelated groups and make a group out of them: 

external direct 23.2.21. Proposition. Let G and H be groups. The Cartesian product G x H y 
product of gro 叩 s called the external direct product ofG and H ， can be given a group structure by 

(g, h) ★ (g\ h f ) = (gg f ,hh r ). 

Furthermore, G = Gx{^},// = {eo} xH t G^cH = HxG, and to within these 
isomorphisms, G x H is the internal direct product ofG x {eu} and {eo} x H. 

The proof is left for the reader. 


Niels Henrik Abel (1802-1829) was the second of seven chil¬ 
dren, son of a Lutheran minister with a small parish of Nor¬ 
wegian coastal islands. In school he received only average 
marks at first, but then his mathematics teacher was replaced 
by a man only seven years older than Abel. Abel’s alcoholic 
father died in 1820, leaving almost no financial support for 
his young prodigy, who became responsible for supporting 
his mother and family. His teacher ， Holmboe, recognizing his 
talent for mathematics, raised money from his colleagues to 
enable Abel to attend Christiania (modem Oslo) University. 

He entered the university in 1821,10 years after the university 
was founded, and soon proved himself worthy of his teacher’s accolades. His second paper, 
for example, contained the first example of a solution to an integral equation. 

Abel then received a two-year government travel grant and journeyed to Berlin, where he 
met the prominent mathematician Crelle, who soon launched what was to become the leading 
German mathematical journal of the nineteenth century, commonly called Crelle's Journal. 
From the start, Abel contributed important papers to Crelie’s Journal, including a classic 
paper on power series, the scope of which clearly reflects his desire for stringency. His most 
important work, also published in that journal, was a lengthy treatment of elliptic functions in 






23.3 GROUP ACTION 663 


which Abel incorporated their inverse functions to show that they area natural generalization 
of the trigonometric functions. In later research in this area, Abel found himself in stiff 
competition with another young mathematician, K. G. J. Jacobi. Abel published some papers 
on functional equations and integrals in 1823. In it he gives the first solution of an integral 
equation. In 1824 he proved the impossibility of solving algebraically the general equation 
of the fifth degree and published it at his own expense hoping to obtain recognition for his 
work. 

Despite his proven intellectual success, Abel never achieved material success, not even 
a permanent academic position. In December of 1828, while traveling by sled to visit his 
fiance for Christmas, Abel became seriously ill and died a couple of months later. Ironically, 
his death from tuberculosis occurred two days before Crelle wrote with the happy news of 
an appointment for Abel at a scientific institute in Berlin. In Abel’s eulogy in his journal, 
Crelle wrote: 

“He distinguished himself equally by the purity and nobility of his character and by a 
rare modesty which made his person cherished to the same degree as was his genius.” 


23.3 Group Action 


The transformation groups introduced at the beginning of this chapter can be 
described in the language of abstract groups. 

left action, right 23.3.1. Definition. Let G be a group and M a set. The left action of G on M is 
action, left invariance a ma p $ ： G x M — ^ M such that 
and right invariance 

1. 4>(e, m) = m for all m € M; 

2. ^(g\g2,m) = ^(gi, m)). 


any group is 
isomorphic to a 
subgrcaipofthe 
group of 
transformations of an 
appropriate set 


One usually denotes m) by g • m or more simply by gm. The right action is 
defined similarly. A subset N C M is called left (right) invariant if g . m £ N 
(m ■ g €： N) for all g g G, whenever m e N. 

233J2, Example. If we define f g : M -> M by fg(jn) = 0(g ， m) = g - m, then f g is 
recognized as a transformation of M. The collection of such transformations is a subgroup 
of the set of ail transformations of M, Indeed, the identity transformation is simply f e , the 
inverse of f g is f g -u and the (associative) law of composition is f gl o f gl = f gl g 2 . There 
is a general theorem in group theory stating that any group is isomorphic to a subgroup of 
the group of transformations of an appropriate set. M 


orbit, stabilizer; 23.3.3. Definition. Let G act on M and let mo € M. The orbit of mo, denoted by 
transitive action and Gmo, is 
effective action 

GffiQ = {m G M \ m = gmo for some g e G}. 

The action is called transitive if Gmo = M• The stabilizer of mo is G mQ = {g e 
G I gmo = mo}. The group action is called effective if gm = m for allm e M 
implies that g = e. 


664 23. GROUP THEORY 


Stabilizer is a 
subgro 叩 . 


realization of a group 


The reader may verify that the orbit Gmo is the smallest invariant subset of M 
containing mo, and that 


23.3.4 ‘ Box. The stabilizer of mo is a subgroup of G 7 which is sometimes 
called the little group of G at mo. 


A transitive action is characterized by the fact that given any two points 
mi, m 2 € M, one can find dig eG such that mi — gm\. 

23.3.5. Example. Let Af = M 2 and G = 50(2), the planar rotation group. The action is 
rotation of a point in the plane about the origin by an angle 0. The orbits are circles centered 
at the origin. The action is effective but not transitive. The stabilizer of every point in the 
plane is {e} t except the origin, for which the whole group is the stabilizer. 

Let M = S^the unit circle, and G = 50(2), the rotation group in two dimensions. 
The action is displacement of a point on the circle. There is only one orbit, the entire circle. 
The action is effective and transitive. The stabilizer of every point on the circle is [e\. 

Let M = G, a group, and let a (proper) subgroup S act on G by left multiplication. 
The orbits are right cosets Sg of the subgroup. The action is effective but not transitive. The 
stabilizer of every point in the group is {e}_ 

Let M = M U [oo], the set of real numbers including “the point at infinity.” Define an 
action of SL(2, E) on M by 

(a b\ ax-he 
\c d)' X= b^Td' 

The reader may check that this is indeed a group action with a law of multiplication identical 
to the matrix multiplication, and that the action is transitive, but not effective. 

Let Af be a set and H the group of transformations of M. Suppose that there is a 
homomorphism f : G ^ H from a group G into H. Then there is a natural action of G on 
M given by g • m = [/(g)](m). The homomorphism / is sometimes called a realization 
ofG.. 團 


23.4 The Symmetric Group S n 

Because of its primary importance as the prototypical finite group, and because 
of its significance in quantum statistics, the symmetric (or permutation) group 
is briefly discussed in this section. It is also used extensively in the theory of 
representation of the general linear group and its subgroups. 

A generic permutation n oin numbers is shown as 


TV 


'1 2 
JT ⑴ TT ⑵ 


7T(/) 


二) • 


(23.1) 


Because the mapping is bijective, no two elements can have the same image, and 
7r(l), tt(2), ■ ■ ■ ， iz(n) exhaust all the elements in the set {z}f =1 . 



23.4 THE SYMMETRIC GROUP S n 665 


e 


冗 3 

TC\ 

兀 5 

TTg 


e 

兀 5 

兀 6 

冗 3 

JT4 

冗 3 

兀 6 

e 

兀 5 

714 

7T2 

7T4 

JT5 

兀 6 

e 

m 

7T 3 

兀 5 

7T4 

兀 2 

7T 3 

冗 6 

e 

兀 6 

7T3 



e 

7T 5 


Table 23.1 Group multiplication table for 53 . 


We can display the product o n\ of two permutations using o n\ (0 = 
兀 2( 丌 1 (0)- For instance, if 

/I 2 3 4\ , /I 2 3 4\ 州。、 

兀 1 = (3 4 1 2) ^ 冗 2 = (2 4 3 1)’ （23 . 2) 

then the product 7t2 o 7t\ takes 1 to 3, etc.，because jt 2 o jti(1)= 兀 2 ( 兀 1(1))= 
tt 2 (3) = 3, etc. We display tt 2 o tti as 

/I 2 3 4\ 

兀 2 。冗 1 = 13 1 2 4；* 

23.4.1. Example. Let us construct the multiplication table for ^ 3 . Denote the elements 
of 53 as follows: 

/I 2 3\ /I 2 3\ /I 2 3\ 

叫 1 2 3) ， 兀 2 = 6 1 3)’ 开 3 = (3 2 1 )， 

(\ 2 3\ (\ 2 3\ /I 2 3\ 

兀 4= (1 3 2) ， 兀 5 = (3 1 2) ， 兀 6 =(2 3 1). 

We give only one sample evaluation of the entries and leave the straightforward~but 
instructive — calculation of the other entries to the reader. Consider jt^ o tT 5 * and note that 
775 ( 1 ) = 3 and 7 T 4 ⑶ = 2; so 7 ^ 407 ^ 5 ⑴= 2. Similarly, ^ 40 ^ 5 ( 2 ) = 1 and 774 0 ^ 5 ( 3 ) = 3. 
Thus 



The entire multiplication table is given in Table 23.1. M 

Note that both the rows and columns of the group multiplication table include 
all elements of the group, and no element is repeated in a row or a column. This 
is because left-multiplication of elements of a group by a single fixed element of 
the group simply permutes the group elements. Stated differently, the left multi¬ 
plication map L g : G — G ，given by L g {x) = gx 9 is bijective, as the reader may 
verify. 

Because we are dealing with finite numbers, repeated application of a permu¬ 
tation to an integer in the set {0/Lx eventually produces the initial integer. This 
leads to the following definition. 












666 23. GROUP THEORY 


cycles of symmetric 23.4.2. Definition^ Letn e S nt i €{1,2,..., n), and let r be the smallest positive 

group integer such that ix r {i) ^ i. Then the set of r distinct elements {n k (0}^=o is ca ^ 

a cycle ofn of length r or an r-cycle generated by i. 

Start with 1 and apply jt to it repeatedly until you obtain 1 again. The collection 
of elements so obtained forms a cycle in which 1 is contained. Then we select a 
second number that is not in this cycle and apply jr to it repeatedly until the 
original number is obtained again. Continuing in this way, we produce a set of 
disjoint cycles that exhausts all elements of {1, 2,, n}. 

23.4.3« Proposition* Any permutation can be broken up into disjoint cycles. 

It is customary to write elements of each cycle in some specific order within 
parentheses starting with the first element, say/, on the left, then 7 t(i) immediately 
to its right, followed by and so on. For example, the permutations 7 t\ and 
1 X 2 of Equation (23.2) and their product have the cycle structures 7i\ = (13)(24 )， 
JT 2 = (124)(3), and — (132)(4), respectively. 

23.4.4. Example. Let 7t\,7t2 € be given by 

_/l 234567 8\ _/l 234567 8\ 

丌 1 = l3 5 7 1 2 8 4 6r 兀 2 =5 6 8 1 7 4 3 厂 

The reader may verify that 

/I 2 3 4 5 6 7 8\ 

712 ° = V6 1 4 2 5 3 8 l) 

and that 

7Ti = (1374)(25)(68), n 2 = (125)(36748), 7r 2 o7ri = (16342)(5)(78).. 
In general, permutations do not commute. The product in reverse order is 

兀 1 。兀 2 =(; ^ 1 g) = (15387) ⑵ (46 )， 

which differs from 兀 2 。芘 1 . However，note that it has the same cycle structure as 兀 2 。 兀 l ， in that 
cycles of equal length appear in both. This is a general property of all permutations. _ 

cyclic permutations 23.4.5« Definition. If n e S n has a cycle of length r and all other cycles of n 
defined have only one element, then n is called a cyclic permutation of length r • 

It follows that 兀 2 各 $4 as defined earlier is a cyclic permutation of length 3. 
Similarly, 

- (1 2 3 4 5 6 、 

z 2 1 3 5 4>l 

is a cyclic permutation of length 4 (verify this). 


23.4 THE SYMMETRIC GROUP S n 667 


transpositions 

defined 


parity of a 
permutation defined 


parity of a 
permutation is 
unique 


23.4.6* Definition. A cyclic permutation of length 2 is called a transposition. 

A transposition (ij) simply switches i and j. 

23.4.7. Example. Products of (not necessarily disjoint) cycles maybe associated with a 
permutation whose action on i is obtained by starting with the first cycle (at the extreme 
right), locating the first occurrence of z, and keeping track of what each cycle does to it or its 
image under the preceding cycle. For example, let jti e be given as a product of cycles 
by rc\ — (143) (24) (456). To find the permutation, we start with 1 and follow the action of 
the cycles on it, starting from the right. The first and second cycles leave 1 alone, and the 
last cycle takes it to 4. Thus, 丌 i (1) — 4. For 2 we note that the first cycle leaves it alone, 
the second cycle takes it to 4, and the last cycle takes 4 to 3. Thus, n\ (2) = 3. Similarly, 
兀 1 ( 3 ) = 1， TTi (4) = 5, jti ⑶ = 6, and 兀 1 ⑹： = 2. Therefore, 

/I 2 3 4 5 6 \ 

丌 1 = U 3 1 5 6 2) • 

We note that n\ is a cyclic permutation of length 6 . 

It is left to the reader to show that the permutation € 55 given by the product 
tt 2 = (13)(15)(12)(14) is cyclic: jt 2 = (14253). _ 

The square of any transposition is the identity. Therefore, we can include it in 
any product of permutations without changing anything. 

23.4.8. Proposition. An r-cycle d ， & ， ••• ，， V) can be decomposed into the prod¬ 
uct ofr — 1 transpositions: 

(h ， h ，•■.，&) = («ii r )(ii2r-i) ■ (hhXhh)- 

Proof. The proof involves keeping track of what happens to each symbol when 
acted upon by the RHS and the LHS and showing that the two give the same result. 
This is left as an exercise for the reader. □ 

Although the decomposition of Proposition 23.4.8 is not unique, it can be shown 
that the parity of the decomposition (whether the number of factors is even or 
odd) is unique. For instance, it is easy to verify that 

1 

(1234) = (14)(13)(12) = (14) (34)(34) (23) (12)(12)(23)(13)(12). 



That is, (1234) is written as a product of 3 or 9 transpositions, both of which are 
odd. 

We have already seen that any permutation can be written as a product of 
cycles. In addition, Proposition 23.4.8 says that these cycles can be further broken 
down into products of transpositions. This implies the following (see [Rotm 84, 
p. 38]): 

23.4.9. Proposition. Any permutation can be decomposed as a product oftrans- 
positions. The parity of the decomposition is unique. 


668 23. GROUP THEORY 


even and odd 23.4.10. Definition. A permutation is even (odd) if it can be expressed as aproduct 
permutations of an even (odd) number of transpositions. 

The parity of a permutation can be determined from its cycle structure and 
Proposition 23.4.8. 

The reader may verify that the mapping from S n to the multiplicative group of 
(+ 1 ， — 1 } that assigns +1 to even permutations and —1 to odd permutations is a 
group homomorphism. It follows from the first isomorphism theorem (Theorem 
23.2.16) that 


23.4.11. Box. The set of even permutations, denoted by A n Jormsa normal 
subgroup of S n . 


This homomorphism is usually denoted by We therefore define 


咖 ） e & = 



if jr is even, 
if jr is odd. 


(23.3) 


Sometimes 5(^) or 8 n as well as sgn(7r) is also used. The symbol, used in 

the definition of determinants, is closely related to In fact, 


^jr(l)7r(2)...jr(rt) — 


Suppose e S n , and note that 7 or(i) i n(i) o o 7 t(i), i.e., 
the composite o* o jt o a~ l of the three permutations takes cr(i) to <r o 7 t(i). This 
composite can be thought of as the permutation obtained by applying o to the two 
rows n \ 2) ::: ^))： 


/ cr(l) a ⑵ … (r(n) \ 

\cr o tt(1) a o jr(2) a ojt{n))' 


In particular, the cycles of a o 宂 o a 一 1 are obtained by applying <r to the symbols 
in the cycles of jt. Since a is bijective，the cycles so obtained will remain disjoint. 
It follows that a on o a~ l , a conjugate of 丌 ， has the same cycle structure as n 
itself. In fact, we have the following: 

23.4.12. Theorem. Two permutations are conjugate if and only if they have the 
same cycle structure. 


To find the distinct conjugacy classes of S n , one has to construct distinct cycle 
structures of S n , This in turn is equivalent to partitioning the numbers from 1 to n 


7 Recall from Chapter 0 that h-->- y means y = f(x). 




23.5 PROBLEMS 669 


partition of n 


Conjugacy classes of 
S„are related to the 
partition of n. 


into sets of various lengths. Let be the number of fe-cycles m a permutation. The 

cycle structure of this permutation is denoted by (1 Vl ， 2 巧 ，•. • ， ). Since the total 
number of symbols is n, we must have ^ =1 kvk = n. Defining kj = Yll=j 
we have 

入 l + 入 2 + … + 4 = 片， 入 l 2 入 2 匕 … 入 a 匕 0. (23.4) 

The splitting of n into nonnegative integers ( 入 i ，入 2 , • •. 》 入 《) as in Equation (23.4) 
is called a partition of n. There is a 1-1 correspondence between partitions of n 
and the cycle structure of 5 rt . We saw how 叫 ’s gave rise to 入 ’s. Conversely, given a 
partition of «， we can construct a cycle structure by vk = kk — 入 /t+i. For example, 
the partition (32000) of 55 corresponds to v\ = 3 - 2 = 1 ， v 2 = 2 - 0 = 2, 
i.e., one 1-cycle and two 2-cycles. One usually omits the zeros and writes (32) 
instead of (32000). When some of the 入 ’s are repeated, the number of occurrences 
is indicated by a power of the corresponding X; the partition is then written as 
O? 1 ， 心 ' … ， fjir r ) 9 where it is understood that 入 1 through X ni have the common 
value fiu etc. For example, (3 2 1) corresponds to a partition of 7 with X\ = 3, 
入 2 = 3, and 入 3 = 1. The corresponding cycle structure is v\ = 0, V 2 = 2, and 
V 3 = i.e., two 2-cycles and one 3-cycle. The partitions of length 0 are usually 

ignored. Since = n 9 no confusion will arise as to which symmetric group 
the partition belongs to. Thus (32000) and (332000) are written as (32) and (3 2 2), 
and it is clear that (32) belongs to S 5 and (3 2 2) to Sg. 

23.4.13. Example. Let us find the different cycle structures of S 4 . This corresponds to 
different partitions of 4. We can take 入 1 = 4 and the rest of the 入 ’s zero. This gives the 
partition (4). Next, we let 人 1 =3; then 人 2 must be 1, giving the partition (31). With^i = 2, 
入 2 can be either 2 or 1. In the latter case, 入 3 must be 1 as well, and we obtain two partitions, 
(2 2 ) and (21 2 ). Finally, if^i = 1， all other nonzero Vs must be 1 as well (remember that 
A-jfe > 又灸 + 1 )• Therefore, the last partition is of the form (l 4 ) • We see that there are 5 different 
partitions of 4. It foliows that there are 5 different conjugacy classes in S 4 . 国 

23.5 Problems 

23.1. Let 5 be a subset of a group G. Show that 5 is a subgroup if and only if 
ab~ l e S whenever a,b e S. 

23.2. Show that the intersection of two subgroups is a subgroup. 

23.3. Let X be a subset of a group G. A word on X is either ec or an element w 
of G of the form 

w = x\ l x^ ■ - -x^ n , 

where x/ e X and e\ — 士 1_ Show that the set of all words on X is a subgroup of 
G. 


670 23. GROUP THEORY 


23A Let[fl,&] denote the commutator of a andfe. Show that 

(a) [a, b]- 1 = [b, a], 

(b) [a, a] = e for alia G G, and 

(c) ab = [a, b]ba. It is interesting to compare these relations with the familiar 
commutators of operators. 

23.5. Show that if 5 is a subgroup, then S 2 ^ SS = S, and r 5 = 5 if and only if 
t S. More generally, TS = S if and only if T C 5. 

23.6. Show that if 5 is a subgroup, then Sa = Sb if and only if ba ^ 1 e S and 
ab~ l € S (aS = bS if and only if a - 1 b e S and b~ l a e S). 

23.7. Let ^ be a subgroup of G. Show that a > b defined by ab~ l G ^ is an 
equivalence relation. 

23.8. Show that Cg(^) is a subgroup of G. Let H be a subgroup of G and suppose 
x e H. Show that C^(x) is a subgroup of Cg(x), 

23.9. (a) Show that the only element a in a group with the property a 2 = a is 

the identity, (b) Now use eg ^ eo == ec to show that any homomorphism maps 
identity to identity, (c) Show that if / : G is a homomorphism, then 

= [f(g)r l - 

23.10. Establish a bijection between the set of right cosets and the set of left cosets 
of a subgroup. Hint: Define a map that takes St to 厂 1 5. 

Lagrange theorem 23.11. Let G be a finite group and 5 one of its subgroups. Convince yourself that 

the union of all right cosets of S is G. Now use the fact that distinct right cosets are 
disjoint and that they have the same cardinality to prove that the order of S divides 
the order of G. In fact, |G| = |5| IG/^j, where \G/S\ is the number of cosets of 
S (also called the index of 5 in G). This is Lagrange’s theorem. 

23.12. Let / : G H be a homomorphism. Show that 0 : G/ker / -> /(G) 
given by ^(g[ker /]) = <p([ker f]g) = f (g) is an isomorphism. 

23.13. Let G f denote the commutator subgroup of a group G. Show that G f is a 
normal subgroup of G and that G/G f is abelian. 

23.14* Let M = EU {oo}，and define an action of 5L(2,R) onMby 

/a b\ _ ax 
\c d) X bx -\- d 

Show that this is indeed a group action with a law of multiplication identical to 
the matrix multiplication, and that the action is transitive, but not effective. 

23*15. Show that two conjugacy classes are either disjoint or identical. 

23«16« Show that if all conjugacy classes of a group have only one element, the 
group must be abelian. 


23.5 PROBLEMS 671 


23.17. Consider a map from the conjugacy class of G containing x e G to the 

set of (left) cosets G/CqM given by — aCo(x). Show that 0 is a 

bijection. In particular, show that |Cg(^)| = \G\/\K^\ where is the class 
in G containing x and |^| its order (see Problem 23.11). Use this result and 
Problems 23.8 and 23.11 to show that | 孖 |/| 尤 f | divides \G\/\K^\. 

23.18. Show that RR^(0)R^ 1 corresponds to a rotation of angle Hint: Consider 
the effect of rotation on the vectors in the plane perpendicular to e, and note that 
the rotated plane is perpendicular to & = R 备, 

23.19. Let G act on M and let mo E M. Show that Gmo is the smallest invariant 
subset of M containing wo. 

23.20. Suppose G is the direct product of H\ and H 2 and g = Show that 
the factors h\ and are unique and that H\ and H 2 are normal. 

23.21. Show that {g, h), (g f , h f ) e GxH are conjugate if and only if g is conjugate 
to g’ and h is conjugate to h f . Therefore, conjugacy classes of the direct product 
are obtained by pairing one conjugacy class from each factor. 

23*22. Find the products tt\ o 712 and 兀 2 。丌 1 of the two permutations 

„ /I 2 3 4 5 6\ 』 /I 2 3 4 5 6\ 

兀 1 = (3 4 6 5 1 2) ^ 冗 2 = (2 1 3 6 5 4 )， 

23.23. Find the inverses of the permutations 

_/l 234567 8\ _ /I 2 3 4 5 6 7 8\ 

71:1 = V3 5 7 1 2 8 4 6 ) ， 兀 2 = U 5 6 8 1 7 4 3) 

and show directly that (jri o 抑 ) 一 1 = o 兀厂 1 . 

23.24. Find the inverse of each of the following permutations: 7t\ 

^ 2 = (14253^^ 3 = (654321^ ㈣ 冗 4 = (34512)- 

23.25. Express each of the following products in terms of disjoint cycles. Assume 
that all permutations are in 57 . 

(a) (123)(347)(456)(145). (b) (34)(562)(273). (c) (1345)(134)(13). 

23.26. Express the following permutations as products of disjoint cycles, and 
determine which are cyclic. 

^ (134562)- W (21456 3 ^* ( c ) ^ 13 5 4 2 ^- 

23.27. Express the permutation jt = ( 24136875 ) ^ & product of transposi¬ 
tions. Is the permutation even or odd? 




672 23. GROUP THEORY 


23.28* Express the following permutations as products of transpositions, andde 


termine whether they are even or odd. 



⑻ g n 》 n). 


/I 2 3 4 5 6 7 8 

W (4 1 7 8 3 6 5 2. 

/I 2 3 4 5 6 7\ 

V6 7 2 4 1 5 3；' 


23.29. Show that the product of two even or two odd permutations is always even, 
and the product of an even and an odd permutation is always odd. 

23.30. Show that jr and jt 一 1 have the same parity (both even or both odd). 

23.31. Find the number of distinct conjugacy classes of S 5 and 灸 . 


Additional Reading 

1. Hamermesh, M. Group Theory and its Application to Physical Problems， 
Dover, 1989. The classic textbook on group theory written specifically for 
physicists. 

2. Rotman, J. An Introduction to the Theory of Groups' 3rd ed., Allyn and Ba¬ 
con, 1984. An excellent (formal, but readable) introduction to group theory 
with many examples and lots of explanations. 

3. Wigner, E. Group Theory and its Application to the Quantum Mechanics 
of Atomic Spectra， Academic Press, 1959. Another classic written by the 
master of group theory himself. 


24 _ 

Group Representation Theory 


Group action is extremely important in quantum mechanics. Suppose the Hamil¬ 
tonian of a quantum system is invariant under a symmetry transformation of its 
independent parameters such as position, momentum, and time. This invariance 
will show up as certain properties of the solutions of the Schrodinger equation. 

Moreover, the very act of labeling quantum-mechanical states often involves 
groups and their actions. For example, labeling atomic states by eigenvalues of 
angular momentum assumes invariance of the Hamiltonian under the action of the 
rotation group (see Chapter 27) on the Hilbert space of the quantum-mechanical 
system under consideration. 


24.1 Definitions and Examples 


In the language of group theory, we have the following situation. Put all the param¬ 
eters jci ,..., of the Hamiltonian H together to forma space, say R p , and write 
H = H(xi ,.. .,Xp) = H(x). A group of symmetry of His a group G whose action 

onR^ leaves H unchanged, 1 i.e .， H(x ■ g) = H(x). For example, a one-dimensional 

^2 ^2 

harmonic oscillator, with H = ——4 - ^mco 2 x 2 f has, among other things, 


parity P (defined by Px = —x) as a symmetry. Thus, the group G = {e, P} is a 
group of symmetry of H. 


The Hamiltonian Hof a quantum-mechanical system is an operator in a Hilbert 
space, such as /C 2 (E 3 ), the space of square-integrable functions. The important 
question is, What is the proper way of transporting the action of G from 


J It will become clear shortly that the appropriate direction for the action is from the right. 



674 24. GROUP REPRESENTATION THEORY 


representation; 
carrier space and 
dimension of a 
representation; 
faithful and identity 
representation 


to £ ； 2 (E 3 )? This is a relevant question because the solutions of the Schrodinger 
equation are, in general, functions of the parameters of the Hamiltonian, and as 
such will be affected by the symmetry operation on the Hamiltonian. The answer 
is provided in the following definition. 

24*1,1. Definition. Let G be a group and% a Hilbert space. A representation of 
G on% is a homomorphism T : G ^ GL(Jf). The representation is faithful if 
the homomorphism is 1—1. We often denote T(g) byJ g . !K is called the carrier 
space of T. The trivial homomorphism T : G —{1} ^ also called the identity 
representation. The dimension of 况 is called the dimension of the representation 

r. 


We do not want to distinguish between representations that differ only by 
isomorphic vector spaces, because otherwise we can generate an infinite set of 
: representations that are trivially related to one another. A vector space isomorphism 
/ : IH 7 induces a group isomorphism <j) : GL(iK) GLCH f ) defined by 

0(T) = /oTo/ -1 for T € GL(M). 

This motivates the following definition. 

equivalent 24.1.2. Definition. Two representations T : G GL(!H) and V : G ^ 
representations GL(^K ; ) are called equivalent if there exists an isomorphism f : such 

that Y g = f oT g o f -1 for all g G G. 


24.1.3. Box. Any representation T : G ^ GL(0i) defines an action of the 
group G on the Hilbert space % by 0(g, \a)) = T g \a). 


As we saw in Chapters 2 and 3, the transfomiation of an operator A under 
Tg would have to be defined by T^A(Tg) _1 . For a Hamiltonian with a group of 
symmetry G, this leads to the identity 

r g [H(x)](r g y l = H(x^g). 

Similarly, the action of the group on a vector (function) in £ 2 (R 3 ) is defined by 

(T g ir)(x ) 三 ir(x-g) t (24.1) 

where the parentheses around T 客少 designate it as a new function. One can show 
that if G acts on the independent variables of a function on the right as in Equation 
(24.1), then the vector space of such functions is the carrier space of a representation 
of G. In fact, 

OW 2 少 )( x ) 三你 - (g\82)) = #((x ■ gi) • g2)) = (J g2 if)(x -gi) =(p(x-gi). 



24.1 DEFINITIONS AND EXAMPLES 675 


Energy eigenstates 
can be labeled by 
eigenvalues of the 
symmetry operators 
as well. 


matrix 

representations 


where we have defined the new function <p by the last equality. Now note that 

= ( 丁幻 炉 ) ⑻ = ( 丁幻 (Tg 2 i/f)Mx) = (T 幻 Tg 2 ^!r)(X). 

It follows fiom the last two equations that 

T 尽 1 §2 少 = 了们了容2诊 • 

Since this holds for arbitrary t/t, we must have = T 5l T g2 , i.e., that 7 is a 
representation. When the action of a group is “naturally” from file left, such as the 
action of a matrix on a column vector, we replace x - g with g 一 1 • x. The reader 
can check that T : G GL(3-C) t given by Tg 少 (x) = 一 1 • x), is indeed a 
representation. 

24*1.4. Example. Let the Hamiltonian of the time-independent Schrodinger equation 
H |^r) = E \}/f} be invariant under the action of a group G. This means that 

TgHT; 1 =H => [H, T^] = 0, 

i.e.，that H and T 客 are simultaneously diagonalizable (Theorem 4.4.15). It follows that we 
can choose the energy eigenstates to be eigenstates of Tg as well, aod we can label the states 
not only by the energy “quantum numbers” 一 eigenvalues of H ― but also by the eigenvalues 
of T^. For example, if the Hamiltonian is invariant under the action of parity P t then we can 
choose the states to be even, corresponding to parity eigenvalue of+1 ， or odd, corresponding 
to parity eigenvalue of—1. Similarly, if G is the rotation group, then the states can be labeled 
by the eigenvalues of the rotation operators, which are, as we shall see, equivalent to the 
angular momentum operators discussed in Chapter 12. 

In crystallography and solid-state physics, the Hamiltonian of an (infinite) lattice is 
invariant under translation by an integer multiple of each so-called primitive lattice transla¬ 
tion, the three noncoplanar vectors that define a primitive cell of the crystal. The preceding 
argument shows that the energy eigenstates can be taken to be the eigenstates of the trans¬ 
lation operator as well. M 

It is common to choose a basis and represent all T g 9 s in terms of matrices. Then 
one gets a matrix representation of the group G. 

24.1.5. Example* Consider the action of the 2D rotation group SO(2) (rotation about 
the z-axis) on R 3 : 

x f = x cos 0 — : v sin 沒， 
r’ = ^? z (^)r =>> / = xsin0 + >^cos0, 
z f = z. 

For a Hilbert space, also choose R 3 . Define the homomorphism T ' G — GL(K) to be 
the identity map, so that 7" (R z (0)) = = R z (0). The operator transforms tlie standard 

basis vectors of as 

丁 0 会 1 = Te(l ， 0, 0) = (cos0 ， sin0 ， 0) = cos 沒 6! + sin 0%2 + 0^3, 

T 0 C 2 — T^(0,1,0) == (— sin 沒 , cos0 3 0) = — sin 併 i + cos 併 2 + 0 会 3 ’ 

T^ 3 = T^(0,0,1) = (0,0,1) = Oei + 0e 2 + 


676 24. GROUP REPRESENTATION THEORY 


It follows that the matrix representation of 50(2) in the standard basis of !K is 

( cos0 — sinO 
sin 0 cos 0 
0 0 

Note that 50(2) is an infinite group; its cardinality is determined by the “number’，of 
沒， s. 题 

24.1.6. Example. Let S 3 act on E 3 on the right by shuffling components: 

( xi ， X 2 9 X ^) >7 t = (文 jr ( l )，：％(2)，々^3))， 71 e S 3- 

For the carrier space, choose as well. Let T : GL(M^) be given as follows: T(7t) 

.x\ . /兀⑴、 . . 

is the matrix that takes the column vector x = (^ 2 ) to ( 知 ( 2 ) As a specific illustration, 

x 3 ^( 3 ) 

consider jr = (J j ■) and write T n forT(jt). Then 

T ^( ei )= 丁兀（1， 0, 0) = (1，0,0 ).jt = (0,1,0) = e 2 , 

T^(e2> = T^(0,1,0) = (0, 1,0 ).jt = (0,0, l)=e 3 , 

T^(e 3 ) = T^(0,0,1) = (0,0,1) • jt = (1 ， 0, 0)==§i ， 

which give rise to the matrix 

/° 0 !\ 

1 0 0 . 

Vo 1 0 / 

The reader may construct the other five matrices of this representation and verify directly 
that it is indeed a (faithful) representation: Products and inverses of permutations are mapped 
onto products and inverses of the corresponding matrices. 團 

The utility of a representation lies in our comfort with the structure of vec¬ 
tor spaces. The climax of such comfort is the spectral decomposition theorems 
of (normal) operators on vector spaces of finite (Chapter 4) and infinite (Chap¬ 
ter 16) dimensions. The operators T g , relevant to our present discussion, are, in 
general, neither normal nor simultaneously commuting. Therefore, the complete 
diagonalizability of all J g 9 s is out of the question (unless the group happens to be 
abelian). 

The best thing next to complete diagonalization is to see whether there are 
common invariant subspaces of the vector space carrying the representation. 
We already know how to construct (minimal) “invariant” subsets of JC: these are 
precisely the orbits of the action of the group G on IK. The linearity of T/s 
guarantees that the span of each orbit is actually an invariant subspace, and that 
such subspaces are the smallest invariant subspaces containing a given vector. Our 
aim is to find those minimal invariant subspaces whose orthogonal complements 
are also invariant. We encountered the same situation in Chapter 4 for a single 
operator. 



24.1 DEFINITIONS AND EXAMPLES 677 


reducible and 
irreducible 
representations 


24.1.7. Definition. A representation T •• G — GL(!H) is called reducible if there 
exist subspaces U and W o/IK such that = U 0 W and both U and W are 
invariant under all If no such subspaces exist, is said to be irreducible. 

In most cases of physical interest, where !K is a Hilbert space, W = 1! 丄 . 
Then, in the language of Definition 4.2.1, a representation is reducible if a proper 
subspace of % reduces all T g ’s. 


24«1.8. Example. Let ^3 acton M 3 as in Example 24.1.6. For the carrier space %, choose 
the space of functions on M 3 , and for T, the homomorphism 7 1 : G GL(Pi) t given by 
Tg^r(x) = ^r(x. g), for i/f G IK. Any ij/ that is symmetric in x, z, such as xyz f x + 3; + z ， 
or x 2 + y 2 + z 2 , defines a one-dimensional invariant subspace of IK. To obtain another 
invariant subspace, consider y, z) = xy and let be as given in Example 

23.4.1. Then, denoting T^. by Tf , the reader may check that 

[T\rlf\](x t y y z) =f\((x,y f z)-7Ti) = ^\{x 9 y t z) =xy- ir\(x 9 y t z), 
\T 2 ^\\(x t y,z) y t z)-7r 2 )= 少 i(nz) —yx — f\{x,y,z), 

[T 3 ^i](x» y, z) = y 9 z) - tt 3 ) = t/ti (z, y, x)=zy = z). 

[T4 少 i](H z) = y,z) - 兀 4) = tiix ， z ， y)=xz = y t z) t 

[T 5 少 i](U ， z) = f\{{x,y,z) *7r 5 ) = in(z,x, y)-zx- ^(x, y 9 z), 
[T 6 ifl](x,y,z) -iri((x,y,z) > 兀 6 ) = z t x) =yz = ir 2 (x,y,z). 

This is clearly a three-dimensional invariant subspace of !K with 伞 2 , and ^3 as a 

convenient basis, in which the first three permutations are represented by 

/I 0 0 \ /I 0 0 \ /0 1 0 \ 

Ti = (0 1 0 ), T 2 = (0 0 lj ， T 3 =(l 0 0). 

\0 Ol/ \0 1 0 / \0 0 1 / 

It is instructive for the reader to verify these relations and to find the three remaining 
matrices. — 

24.1.9. Example. Let S 3 act on M 3 as in Example 24.1.6. For the earner space of repre¬ 
sentation, choose the subspace V of the % of Example 24.1.8 spanned by the six functions 
x, y f z, xy, xz, and yz. For T, choose the same homomorphism as in Example 24.1.8 re¬ 
stricted to V, It is clear that the subspaces U and W spanned, respectively, by the first three 
and the last three functions are invariant under 53 , and that V = U ® W. It follows that 
the representation is reducible. The matrix form of this representation is found to be of the 
general form (§ 吕 )， where B is one of the 6 matrices of Example 24.1.8. The matrix A, 
corresponding to the three functions x 9 y } and can be found similarly. M 

Let 況 be a carrier space, finite- or infinite-dimensional. For any vector \a) t the 
reader may check that the span of {T g \a)} ge G is an invariant subspace of 0~C. If G 
is finite，this subspace is clearly finite-dimensional. The irreducible subspace con¬ 
taining |a>, a subspace of the span of {T g \a)} ge G^ will also be finite-dimensional. 
Because of the arbitrariness of \a), it follows that every vector of DX lies in an 
irreducible subspace, and that 




678 24. GROUP REPRESENTATION THEORY 


All representations 
are equivalent to 
unitary 
representations. 


24.1.10. Box. All irreducible representations of a finite group are finite- 
dimensional. 


Due to the importance and convenience of unitary operators (for example，the 
fact that they leave the inner product invariant), it is desirable to be able to construct 
a unitary representation~ora representation that is equivalent to one — of groups. 
The following theorem ensures that this desire can be realized for finite groups. 

24.1.11. Theorem. Every representation of a finite group G is equivalent to some 
unitary representation. 

Proof. We present the proof because of its simplicity and elegance. Let 7 1 be a 
representation of G. Consider the positive hermitian operator T = J^xeG 义 
and note that 


= E[r(g)] 卞 [r ⑷ ] 1 Y(x)r(g) 

X€G 

= ㈣] 中 rocg) = = T ， 

xeG y^G (24.2) 

where we have used the fact that the sum over x and y = xg sweep through the 
entire group. Now let S = ^/T, and multiply both sides of Equation (24.2) — with 
S 2 replacing T — by S— 1 on the left and by TJ^ -1 on the right to obtain 

S- 1 ^ = ST-^ -1 =» (ST^S— 1 ) 1 * = (ST^S—b 一 1 Vg€G. 

This shows that the representation T f defined by = ST g S 一 1 for all g e G is 



There is another convenience afforded by unitary representations: 

24.L12. Theorem* Let T : G GL(Ji) be a unitary representation and W an 
invariant subspace of%. Then，is also invariant. 

Proof. Suppose \a) G W 丄 .We need to show that ja) e W 丄 for all g G G. To 

this end, let \b) e W. Then 

(b\T g \a) = ({a\Tl \b)T = ((^U； 1 |fe))* = ({^iT.-i ⑻ )* = 0 ， 

because [&> € W. It follows from this equality that \a) e W" 1 for all 

geG, □ 

The carrier space ^ of a unitary representation is either irreducible or has an 
invariant subspace W, in which case we have = "W ㊉ 抓 丄， where, by Theorem 

24.1.12, W 丄 is also invariant. If W and W 丄 are not irreducible, then they too can 





24.1 DEFINITIONS AND EXAMPLES 679 


antisymmetric 
representation of a 
permutation group 


adjoint and complex 
conjugate 
representations 


be written as direct sums of invariant subspaces. Continuing this process, we can 
decompose % into irreducible invariant subspaces W ⑻ such that 

況 =W ⑴ ㊉ W (2) ㊉ W (3) ㊉… ■ 

If the carrier space is finite-dimensional, which we assume from now on and for 
which we use the notation V, then the above direct sum is finite and we write 

p 

V = ⑽⑴ ㊉㈧ ⑵ ㊉ _ (24.3) 

k=l 

One can think of W ⑻ as the carrier space of an (irreducible) representation. 
The homomorphism : G GL(W^) is simply the restriction of T to the 
subspace W ⑻， and we write 

0^ 2) © ■.. 0T^> = ^0T^. 

" " k=l 

If we identify all equivalent irreducible representations and collect them together, 
we may rewrite the last equation as 

p 

T g = miT^ 0 m 2 rf ) ㊉…㊉ m p T ( g p) = (24.4) 

a=l 

where p is the number of inequivalent irreducible representations and m a are 
positive integers giving the number of times an irreducible representation T^ a ) and 
all its equivalents occur in a given representation. 

In terms of matrices, will be represented in a block-diagonal formas 



1 

where some of the T g maybe equivalent. 

24.1.13. Example. A one-dimensional (and therefore irreducible) representation, de¬ 

fined for all groups, is the trivial (symmetric) representation T \ G -> C given by T(g) = 1 
for all g G G. For the permutation group one can define another one-dimensional (thus 
irreducible) representation T : S n ^ C, called the antisymmetric representation，given 
by T (n) = +1 if 7T is even, and T (?r) = —1 if is odd. 嚜 

Given any (matrix) representation T of G, one can form the transpose inverse 
matrices (Tp _1 , and complex conjugate matrices T|. The reader may check that 
each set of 5iese matrices forms a representation of G. 

24.1.14. Definition* The set of matrices (Tp -1 and T; are called ， respectively ， the 

adjoint representation, denoted by T y and the complex conjugate representation ， 
denoted by T*. 




24. GROUP REPRESENTATION THEORY 


24.2 Orthogonality Properties 

Homomorphisms preserve group structures. By studying a group that is more 
attuned to concrete manipulations, we gain insight into the structure of groups 
that are homomorphic to it. The group of invertible operators on a vector space, 
especially in their matrix representation, are particularly suited for such a study 
because of our familiarity with matrices and operators. The last section reduced 
this study to inequivalent irreducible representations. This section is devoted to a 
detailed study of such representations. 

Schur’s lemma 24.2.1. Lemma. (Schur’s lemma) Let T \ G ^ GL(V) and T f ，. G — GL(V r ) 
be irreducible representations of G. If A e ^>(V, V r ) is such that 

AT 5 = T；A (24.5) 

then either A is an isomorphism (Le.，T is equivalent to T f ) y or A = 0. 

Proof. Let \a) € kerA. Then 


AJ g \a) =T^A|fiO =0 T g \a) € kerA V g e G. 


It follows that kerA, a subspace of V, is invariant under T. Irreducibility of T 
implies that either ker A = V, or ker A = 0. The first case asserts that A is the zero 
linear transformation; the second case implies that A is injective. 

Similarly, let \b) e A(V). Then \b) = A |x) for some \x) € V: 


T ； \b) = T；A \x) = AT g |jc) T ； \b) € A(V) W g e G. 


eA(V) 


It follows that A(V)，a subspace of V\ is invariant under T f . Irreducibility of T f 
implies that either A(V) = 0, or A(V) = V f . The first case is consistent with the first 
conclusion drawn above: ker A = V. The second case asserts that A is surjective. 
Combining the two results, we conclude that A is either the zero operator or an 
isomorphism. □ 


Lemma 24.2.1 becomes extremely useful when we concentrate on a single 
irreducible representation, i.e.，when T r = T. 

24.2.2. Lemma. Let T : G ^ GL(V) be an irreducible representation of G. If 
A € Z(V) is such that AT g — T g Afor all g e G, then A = 

Proof. Replacing V with V in Lemma 24.2.1, we conclude that A = 0 or A is 
an isomorphism of V. In the first case, A. = 0. In the second case, A must have a 
nonzero eigenvalue 入 and at least one eigenvector (see Theorem 4.3.4). It follows 
that the operator A — 入 1 commutes with all T/s and it is notan isomorphism (why 
not?). Therefore, it must be the zero operator. □ 


24.2 ORTHOGONALITY PROPERTIES 681 


We can immediately put this lemma to good use. If G is abelian, all operators 
{^x}xeG commute with one another. Focusing on one of these operators, say T^, 
noting that it commutes with all operators of the representation, and using Lemma 
24.2.2, we conclude that J g = XI. It follows that when T 容 acts on a vector, it gives 
a multiple of that vector. Therefore，it leaves any one-dimensional subspace of the 
carrier space invariant. Since this is true for all g g G, we have the following 
result. 

24.2«3» Theorem. All irreducible representations of an abelian group are one- 
dimensionaL 

This theorem is an immediate consequence of Schur’s lemma, and is independent 
of the order of G. In particular, it holds for infinite groups, if Schur’s lemma holds 
for those groups. One important class of infinite groups for which Schur’s lemma 
holds is the Lie groups. Thus, all abelian Lie groups have 1-dimensional irreducible 
representations. We shall see later that the converse of Theorem 24.2.3 is also true 
for finite groups. 


Issai Schur (1875-1945) was one of the most brilliant math- 
ematicians active in Germany during the first third of the 
twentieth century. He attended the Gymnasium in Libau (now 
Liepaja, Latvia) and then the University of Berlin, where he 
spent most of his scientific career from 1911 until 1916. When 
he returned to Berlin, he was an assistant professor at Bonn. 

He became full professor at Berlin in 1919. Schur was forced 
to retire by the Nazi authorities in 1935 but was able to em¬ 
igrate to Palestine in 1939. He died there of a heart ailment 
several years later. Schur had been a member of the Prussian 
Academy of Sciences before the Nazi purges. He married and 
had a son and daughter. 

Schur’s principal field was the representation theory of groups, founded a little before 
1900 by his teacher Frobenius. Schur seems to have completed this field shortly before 
World War I, but he returned to the subject after 1925, when it became important for 
physics. Further developed by his student Richard Brauer, it is in our time experiencing 
an extraordinary growth through the opening of new questions. Schur’s dissertation (1901) 
became fundamental to the representation theory of the general linear group; in fact, English 
mathematicians have named certain of the functions appearing in the work “S-functions” 
in Schur’s honor. In 1905 Schur reestablished the theory of group characters — the keystone 
of representation theory. The most important tool involved is “Schur’s lemma.” Along with 
the representation of groups by integral linear substitutions, Schur was also the first to 
study representation by linear fractional substitutions, treating this more difficult problem 
almost completely in two works (1904, 1907) - In 1906 Schur considered the fundamental 
problems that appear when an algebraic number field is taken as the domain; a number 
appearing in this connection is now called the Schur index. His works written after 1925 
include a complete description of the rational and of the continuous representations of the 
general linear group; the foundations of this work were in his dissertation. 









682 24. GROUP REPRESENTATION THEORY 


A lively interchange with many colleagues led Schur to contribute important memoirs 
to other areas of mathematics. Some of these were published as collaborations with other 
authors, although publications with dual authorship were almost unheard of at that time. 
Here we simply indicate the areas: pure group theory, matrices, algebraic equations, number 
theory，divergent series，integral equations, and function theory. 


24,2A» Example. Suppose that the Hamiltonian Hof a quantum mechanical system with 
Hilbert space *K has a group of symmetry with a representation T : G GL{^i), Then 
HT g = TgH for all ^ e G. It follows that H = XI if the representation is irreducible. 
Therefore, 


24.2.5. Box. All vectors of each imariant irreducible subspace are eigenstates of 
the hamiltonian corresponding to the same eigenvalue, i.e” they all have the same 
energy. Therefore, the degeneracy of that energy state is at least as large as the 
dimension of the carrier space. 


It is helpful to arrive at the statement above from a different perspective. Consider a 
vector \x) in the eigenspace corresponding to the energy eigenvalue E“ Since Tg and H 
commute, |x) is also in M/. Therefore, an eigenspace of a Hamiltonian with a group of 
symmetry is invariant under all for any representation T of that group. If J is one of the 
irreducible representations of (?，say with dimension n a , then dim > n a . — 

Consider two irreducible representations T ⑻ and of a group G with car¬ 

rier spaces W ⑻ and W ⑼， respectively. Let X be any operator in £/(W ⑷， W ( 芦 )）， 
and define 

A= ^ ^2 T ia \x)XT w (x- 1 ). 

xeG xeG 

Then, we have 

T ? )A = ⑼ Cc -1 )r ⑻ (g -1 )r ⑹ fe) 

xeG 

=[ T^(gx)XT^\(gx)- 1 ) T^\g) = AT^>. 
xeG 

t -> 

=A because this sum also covers all G 

We are interested in the two cases where = T^\ and where is not 
equivalent to T^\ In the first case. Lemma 24.2.2 gives A = XI; in the second 
case, Lemma 24.2.1 gives A = 0. Combining these two results and labeling the 
constant multiplying the unit operator by X, we can write 

X) T^\g)XT^Hg- 1 )= 人 (24.6) 

geG 









24.2 ORTHOGONALITY PROPERTIES 683 


character of a 
representation; 
simple character, 
compound character 


The presence of the completely arbitrary operator X indicates that Equation (24.6) is 
a powerful statement about — and a severe restriction on — the operators T^ a \g), 
This becomes more transparent if we select a basis, represent all operators by 
matrices, and for X, the matrix representation of X, choose a matrix whose only 
nonzero element is 1 and occurs at the Ith row and mth column. Then Equation 
(24.6) becomes 

E 咕)⑻ O -1 )=々 a 咖， 

g^G 

where Xi m is a constant that can be evaluated as follows. Set) = i，a = 卢 ， and 
sum over r_. The RHS will give Mm Ei = 入 / 泔 ~， where n a is the dimension of 
the carrier space of T^ a \ For the LHS we get 

lhs = J 2 = (r ⑻(容-％ ⑻ d 

geG i geG 

=E = ；E 以⑻ = ，， 


where |G| is the order of the group. Putting everything together, we obtain 


|G| 


S eG 


(24.7) 


or 




g^G 


\G 

n a 




(24.8) 


if the representation is unitary. 

Equations (24.7) and (24.8) depend on the basis chosen in which to express 
matrices. To eliminate this dependence, we first introduce the important concept 
of character. 

24.2.6. Definition. Let T •• G — GL(V)bea representation of the group G. The 
character of this representation is the map x ： G C given by 

i 

where T (g) is the matrix representation ofTgin any basis ofV.IfT is irreducible ， 
the character is called simple; otherwise, it is called compound. 

The character of the identity element in any representation can be calculated 
immediately. Since a homomorphism maps identity onto identity, T e = 1- There¬ 
fore, 


X(e)= 过 ⑴ =dim V. 


(24.9) 



684 24. GROUP REPRESENTATION THEORY 


Recall that two elements x 9 y G G belong to the same conjugacy class if there 
exist g e G such that x — gyg~^. This same relation holds for the operators 
representing the elements: T ^： — T^TyT^-i. Taking the trace of both sides, and 
noting that T^-i = T" 1 , one can show that 


24.2.7. Box. All elements of a group belonging to the same conjugacy class 
have the same character. 


Setting i = l and j = min (24.7) and summing over i and j\ we obtain 
geG na i，j Ha j (24.10) 

v . 」 

—ft a 

or 

E x ⑻⑻ x’te) = \G\8 a ^ (24.11) 

g^G 

if the representation is unitary. This equation suggests a useful interpretation: 
Characters can be thought of as vectors in a [ G|-dimensional inner product space. 
According to Equation (24.11), the characters of inequivalent irreducible represen¬ 
tations are orthogonal. In particular, since there cannot be more orthogonal vectors 
than the dimension of a vector space, we conclude that the number of irreducible 
inequivalent representations of a group cannot be more that the cardinality of that 
group. Actually, we can do better. Restricting ourselves to unitary representations 
and collecting all elements belonging to the same conjugacy class together, we 
write 

伽 =1 即邱今 (X W \x (a) ) = \G\S a ^ (24.12) 

/ =1 

where i labels conjugacy classes, ci is the number of elements in the /th class, 
r is the number of classes in G, and lx ⑻） € C r is an 厂 -dimensional vector 
with components {cV 2 /.^}^ =1 . Equation (24.12) shows that vectors belonging to 
different irreducible representations are orthogonal. Since there cannot be more 
orthogonal vectors than the dimension of a vector space, we conclude that the 
number of inequivalent irreducible representations of a group cannot be more that 
the number of conjugacy classes of the group, i.e” p <r. 

The characters of the adjoint representation are obtained from 

X(g) = x(g~ l ) => Xi - Xi f > 

where is the class consisting of all elements inverse to those of the class K(. 
The equations involving characters of inverses of group elements can be written 




24.3 ANALYSIS OF REPRESENTATIONS 


in terms of the characters of the adjoint representation. For example, Equation 
(24.10) becomes 

⑷⑽ (〜) = l 即邮今 = \G\S a p. (24.13) 

g^G i’=l 

Other relations can be obtained similarly. 

24.3 Analysis of Representations 

We can use the results obtained in the last section to gain insight into a given 
representation. Take the trace of both sides of Equation (24.4) and write the result 
as 

X(g)=mX {1) (g)^--+ fn P X ip) (g) = X^«X (a) (g); (24.14) 

a=l 

i.e.，a compound character is a linear combination of simple characters with non¬ 
negative integer coefficients. Furthermore, the orthogonality of simple characters 
gives 

Wa = 論⑷ * ⑻， （ 24.15) 

|G| 為 

yielding the number of times the irreducible representation occurs in the 
representation 7\ 

Another useful relation will be obtained if we multiply Equation (24.14) by its 
complex conjugate and sum over g; the result is 

E Ix0?)l 2 = E x(g)x*(g) = E ⑷ (g) ⑻ 

geG g€G g^G « p 

.= ^X (a) (g)x 

^ geG « (24.16) 

' --- ^ 

In particular, if T is irreducible, all m a are zero except for one, which is unity. We 

criterion for therefore obtain the criterion for irreducibility: 
irreducibility 

Dlxte)l 2 = tci|;ol 2 = |G| if T is irreducible. (24.17) 

g^G i=l 

For groups of low order and representations of small dimensions. Equation 
(24.16) becomes a powerful tool for testing the irreducibility of the representation. 


24. GROUP REPRESENTATION THEORY 


24.3.1_ Example. Let G = and consider the representation of Example 24.1.8. The 
characters of the first three elements of this representation are easily calculated: 

Xl ^trTi =3, X2 = trT 2 = 1, X3 = 1* 

Similarly，one can obtain X4 = X 5 = and X6 = 0 - Substituting this in Equation (24.16) 
yields 

[I 湖、吏 lx/ = 3 2 + l 2 + l 2 + l 2 + 0 2 + 0 2 = 12. 

8^0 j=l 

Comparing this with the RHS of (24.16) with |G| = 6 yields mg = 2. This restricts 
the nonzero a's to two, say a = l and a = 2. Moreover, mi and m 2 can be only 1. Thus, 
the representation of Example 24.1.8 is reducible, and there are precisely two inequivalent 
irreducible representations in it, each occurring once. 

We can actually find the invariant subspaces corresponding to the two irreducible repre¬ 
sentations revealed above. The first is easy to guess. Just taking the sum of the three functions 
♦1 ， 伞 2, and ^3 gives a one-dimensional invariant subspace; so, let 沴 1 = + 少 2 + 少 3, 
and note that the space Wi spanned by is invariant. The second is harder to discover. 
However, if we assume that 少 1 » 少 2 , and 1^3 are orthonomal, then using the Gram-Schmidt 
process, we can find the other two functions orthogonal to (j>\ (but not orthogonal to each 
other!). These are 


♦2 = + 2^2 - ^3 03 ^ 一攻 1 + 2^3. 

The reader is urged to convince himself/herself that the subspace W( 2 ) spanned by 02 
03 is the complement of WG) [i.e., V = W ⑴㊉ *W( 2 〉] and that it is invariant under all 
T/s. ■ 


regular 

representation 


A very useful representation can be constructed as follows. Let G = [gj }^ =1 , 
and recall that left multiplication of elements of G by a fixed element gi is a 
permutation of (g\,g 2 , … ， g m ). Denote this permutation by jt*. Now define a 
representation R : G GL(W n ), called the regular representation, by 

Hgi ( 文 1 ，又 2， ■ . ■ ，戈 m ) = i, x 7ii (1)» ^7ti (2) 1 


That this is indeed a representation is left as a problem for the reader. One can 
obtain a matrix representation of R by choosing the standard basis ofM /M 

and noting that R g{ ey = From such a matrix representation it follows that 

all characters x R of the regular representations are zero except for the identity, 
whose character is / 丑 ⑻ =m [see Equation (24.9)]. Now use Equation (24.14) 
for 裒 ==e and for the regular representation to obtain m — J2a=i m ot n a where n a 
is the dimension of the a-th irreducible representation. We can find m a by using 
Equation (24.15) and noting that only g = e contributes to the sum: 

= A 及 (g)X (a) *(g) = -X R (e)x ia) *(e) = n (X . 



24.4 GROUP ALGEBRA 687 


In words, 


24.3.24 Box. The number of times an irreducible representation occurs in 
the regular representation is equal to the dimension of that irreducible rep¬ 
resentation. 


We therefore obtain the important relations 

= \G\8n = ^n a Xi a) and \G\ = (24.18) 

a=l a=l 

where we have assumed that the first conjugacy class is that of the identity. For 
finite groups of small order, the second equation can be very useful in obtaining 
the dimensions of irreducible representations. 

24.3.3. Example. A group of order 2 or 3 has only one-dimensional inequivalent irre¬ 
ducible representations, because the only way that Equation (24.18) can be satisfied for 
|G| = 2 or 3 is for ail to be 1. A group of order 4 can have either 4 one-dimensional 
or one 2-dimensional inequivalent irreducible representations. The symmetric group 1 S 3 , 
being of order 6 , can have 6 one-dimensional, or 2 one-dimensional and one 2 -dimensional 
inequivalent irreducible representations. We shall see later that if all inequivalent irreducible 
representations of a group are one-dimensional, then the group must be abelian. Thus, the 
first possibility for S 3 must be excluded. ■ 


24.4 Group Algebra 

Think of group elements as (linearly independent) vectors. In fact, given any set, 
one can generate a vector space by taking linear combinations of the elements of 
the set assumed to forma basis. In the case of groups one gets a bonus: The product 
already defined on the basis (group elements) can be extended by linearity to all 
gro 叩 algebra elements of the vector space to turn it into an algebra called the group algebra, 
defined For G = {gj}J =v a typical element of the group algebra is a = One 

can add two vectors as usual. But the product of two vectors is also defined: 



/ m \ / m 

V=i / \j=i 


m 


m 


m 


^jgj ) = J2Jl aib j gi8 ^ = H 嘯， 

i=l j—l k=l 


8k 


where Ck is a sum involving fl/ and bj. The best way to learn this is to see an 
example. 



688 24. GROUP REPRESENTATION THEORY 


idempotent element 
of algebras 


24.4.1. Example. Let G = and consider a = andb = 兀2 - 2 冗 4+3tT6- 

Then，using Table 23.1, we obtain 

ab = ( 2 ?ri — 3 丌 3 + 兀 5 )( 兀 2 — 2 兀 4 + 3^) 

= 2n\7l2 ~ 4tTiJT4 4 - 67T17T6 — 3 丌 3 兀 2 + 67T37T4 

— 9^3 ^6 + 兀 5 兀 2 一 2^5 7T4 + 3 兀 5 兀 6 

= 2712 ~ 47T4 + 6 jT6 — 37T6 + 6jT 5 — 97T2 + 7T4 — 2^3 4* 3?Tl 
= 37TJ — 7^2 — 27T3 — 37T4 + 6 jT5 4- 371*6. 匾 

Let A be any algebra. As a finite-dimensional vector space, we can always find 
two proper subspaces rC! and £2 such that A is a direct sum of L\ and L 2 . We 
write this as 2 = £1 -f - If we demand that this sum be invariant under left 

(right) algebra multiplication, then it is clear that and £-2 must be left (right) 
ideals, in which case we write 

•A = £11 ㊉ £ 12 . 

Now assume that A has an identity 1， which as a vector in can be decomposed 
uniquely as 

1 = li +12 (24.19) 

with li e £1 and I 2 € ^ 2 - Similarly, 
a = ai + a 2 V a € A 
Multiplying these two equations, we obtain 

(ai + a 2 )l = aj -f- a 2 = a(li + 12 ) = all + al 2 . 

The uniqueness of the decomposition of a now implies that 
ai = all, a 2 = al 2 . 

So, if a 6 Zi, then a 2 = 0 and 
a = all, al 2 = 0, 

with a similar result for a € ( 2 . Since li e L\ and I 2 € £ 2 , we have 

1? = ll ， I 1 I 2 = 0; ll = 1 2 , l 2 h = 0. (24.20) 

An element a e A that satisfies a 2 = a is called an idempotent. 3 Thus, li 
and I 2 are idempotents. Furthermore, they generate and £ 2 , respectively; i.e” 
广 1 = Al\ and = A\^ (see the last section of Chapter 1). 


^The reason for changing to this new notation is to reserve ㊉ for the following. 
3 In the algebra of operators, we called such an element a projection operator. 



24.4 GROUP ALGEBRA 


24.4.1 Group Algebra and Representations 

Group algebra is very useful for the construction and analysis of representations of 
groups. In fact, we have already used a similar approach in the construction of the 
regular representation. Instead of W n used before, use the m-dimensional vector 
space A, the group algebra. Then left-multiplication by a group element g can be 
identified with T^, the operators of the regular representation, and the invariant 
subspaces of A become the left ideals of 义 ， and we can write 

J\, = /Cl © £ ；2 ㊉".㊉ 心 r ■ 

Moreover, since the identity element of the group is the identity element of the 
resolution of the algebra as well, the argument at the end of the last subsection gives the resolution 4 
identity 

€ = 6\ ^ ' m Cf — 0 for i 7* (24.21) 

essentially It is clear that if a 2 = cxa, then a/a will be idempotent. So, we can essentially 
idempotent elements ignore the constant a, which is why a is called essentially idempotent. Now 

consider the element of the group algebra 

P = J^x (24.22) 

xeG 

and note that gP = YlxeG f = 尸 .It follows that 

geG xeG geGxeG geG 

So, P is essentially idempotent. Furthermore, the reader may verify that the ideal 
generated by P is one-dimensional. 

primitive An idempotent that cannot be resolved into other idempotents satisfying Equa- 

idempotents tion (24.21) is called a primitive idempotent. The reader may check that the 
following holds. 

24.4.2* Proposition, A left ideal is minimal if and only if it is generated by a 
primitive idempotent 

Let us now apply the notion of the group algebra to derive further relations 
among characters. Denote the elements of the ith class Ki of G by and 

construct the element of the group algebra/c/ = A ： f\lfinthe product of two 

such quantities 

fCiicj = (24.23) 

1=1 m—\ 


4 In the algebra of operators, we called elements satisfying these relations orthogonal projection operators. 


24. GROUP REPRESENTATION THEORY 


x^xjjp = y G G, is in a certain conjugacy class, then the rest of that class can be 
obtained by taking all conjugates of y y i_e., elements of G that can be written as 

gyg~ l = g4 i)x m } g~ l - g4 l) s~ l 8^mg~ l • 

y v y 〆 

eKi . eKj 

It follows that if one member of a class appears in the double sum of Equation 
(24.23), all members will appear there. The reader may check that if y occurs k 
times in the double sum, then all members of the class of y occur k times as well. 
Collecting all such members together, we can write 

r 

— (24,24) 
1=1 

where ciji are positive integers. 

Now consider the ath irreducible representation, and add all operators corre¬ 
sponding to a given class: 

T («) ^ T< a) Tf)T; a) = 亡 巧 /lf )， (24.25) 

geKi / =1 

where the second equation follows from the same sort of argument used above 
to establish Equation (24.24). One can show that commutes with all T^. 

Therefore, by Schur’s lemma, and the second equation in (24.25) 

becomes 

(24.26) 

l=l 

Taking the characters of both sides of = 久 and using the first equation in 
(24.25), noting that all elements of a class have the same character, we get 

(a) 

ria 

Substituting this in Equation (24.26), we obtain 

CiCjXi^Xj^ =n a J2 c iJl c lX^^ (24.27) 

/ =1 

This is another equation that is useful for computing characters. Note that this 
equation connects the purely group properties (c/ *s and Ciji's) with the properties 



character table of a 
finite group 


24.4 GROUP ALGEBRA 691 


of the representation (x/ a) ’s and n a ). Summing Equation (24.27) over a and using 
the first equation in (24.18), we get 

cicj xl a) Xj a) = Yl ci J ici 12 HaX i a)= 卬 1 i G i 

Df=l 1=1 Of=l 

' - V - ^ 

=|G|5/i by (24.18) 

because ci = 1 (there is only one element in the class of the identity). Problem 
24.12 shows that Cij\ = c^j where K ( t is the class consisting of inverses of 
elements of Ki ， It then follows that 

= 迎 V/. (24.28) 

C J 

For a unitary representation, = x/ a )' 阳 Equation (24.28) becomes 

^Xi^Xj 01 ^ = => (Xj\Xi) = fW (24.29) 

or=l C ] 

where |x" e is a /)-dimensional vector with components {Xi^}a=v 
equation can also be written in terms of group elements rather than classes. Since 

= X^ a H x ) for any x € K(, we have 

y £x ia) (x)x ia) Hy) = K^), (24.30) 

where is the conjugacy class of G containing x, \K^\ is the number of its 
elements, and 

1 ifK^K^ 

0 otherwise. 

Equation (24.29) shows that the r /7-dimensional vectors x / a ) 脱 mutually 
orthogonal; therefore, r < p. The statement after Equation (24.12) was that r > p. 
We thus have the following: 

24.4.3. Theorem. The number of inequivalent irreducible representations of a 
finite group is equal to the number of conjugacy classes in the group. 

It is convenient to summarize our result in a square table with rows labeled by 
the irreducible representation and columns labeled by the conjugacy classes of G. 
Then on the ofth row and / th column we list x/ a ) ， and we get Table 24.1, called the 
character table of G. Note that c,-, the order of & ， is written as a left superscript. 





692 24. GROUP REPRESENTATION THEORY 



Cl ^l 

c ^k 2 • • 

. Ci Ki … 

Cr K r 

^(1) 

7(2) 

♦ 

Y d) 

A1 

y (2) 

义 1 

■ 

" 

^2 •• 
■ 

.x/ 1 ) … 
x/ 2) - ■ ■ 

X 尸 
x, (2) 

• 

• 

^(a) 

• 

■ 

y(«) 

A1 

« 

_■ 

• xi a) … 

琴 

X 产 

• 

• 

T( r ) 

xi r) 

(<•) 

X2 

(>■) 

• X/ … 



Table 24.1 A typical character table. 


Character tables have the property that any two of their rows are orthogonal in 
the sense of Equation (24.12), and any two of their columns are orthogonal in the 
sense of Equation (24.29). 

If all inequivalent irreducible representations of a group G have dimension 
one, then there will be |G| of them [by Equation (24.18)]. Hence, there will be 
|G| conjugacy classes; i.e.，each class consists of a single element. By Problem 
23.16, the group must be abelian. Combining this with Theorem 24.2.3, we have 
the following theorem. 

24.4.4. Theorem, A finite group is abelian if and only if all its inequivalent irre¬ 
ducible representations are one-dimensional, 

24.5 Relationship of Characters to Those of a Sub¬ 
group 

Let 丑 be a subgroup of G. Denote by and the i7-class containing h e H 
and the G-class containing g, respectively. Let dj and q be the number of elements 
in the j'th -class and /th G-class, respectively. Any representation of G defines 
a representation of H by restriction. An irreducible representation of G may be 
reducible as a representation of H, This is because although the subspace W( a) of 
the carrier space that is irreducible under G is the smallest such subspace containing 
a given vector, it is possible to generate a smaller subspace by applying a subset 
of the operators corresponding to those 发 ’s that belong to H. It follows that 

r ⑹ = ⑽ / ⑷⑻， heH, (24.31) 

cr 

where m aa are nonnegative integers as in Equation (24.14) and are irreducible 
representations of H. If x( a ) and 备 ( a ) denote the characters of irreducible repre¬ 
sentations of G and H, respectively, then the equivalent equation for the characters 




24.5 RELATIONSHIP OF CHARACTERS TO THOSE OF A SUBGROUP 


is 

X ⑻⑻ = ⑻， heH. (24.32) 

(J 

Multiply both sides by sum over h G H, and take the complex conjugate 

at the end. Then by the orthogonality relation (24.11), applied to H, we obtain 

E 产雌⑹ w . (24.33) 

heH 

Now multiply both sides of Equation (24.33) by x ⑻（宮 )， sum over a, and use 
Equation (24.30) to obtain 

E 8< < K ^ ⑻. ( 从 34 ) 

a '^8 1 heH 

The sum on the right can be transformed into a sum over conjugacy classes of H. 
Then Equation (24.34) becomes 

| IX) ，〖 = 1 ， 2 ,…，”， (24.35) 

a ' ' Ci j 

where the sum on the LHSis over irreducible representations of G, and on the RHS 
it is over those -classes j that lie in the iih G-class. Note that the coefficients 
\G\dj/(\H\ci) are integers by Problem 23.17. 

Equations (24.34) and (24.35) are useful for obtaining characters of G when 
those of a subgroup H are known. The general procedure is to note that the RHS 
of these equations are completely determined by the structure of the group G and 
the characters of H. Varying /, the RHS of (24.35) determines the r components 
of a (compound) character | 少 ) ， which, by the LHS, can be written as a linear 
combination of characters of G: 

\ir)^m a \ X {a) ), ( 24 . 36 ) 

a=l 

where we have suppressed the irrelevant subscript tc. If we know some of the 
lx (a) }’s，we maybe able to determine the rest by taking successive inner products 
to find the integers m a ， and subtracting each irreducible factor of the sum from the 
LHS. We illustrate this procedure for S n in the following example. 

24.5.1. Example* Let = (l 2 ) and K 2 = (2) for S 2 (see Section 23.4 for notation). 
Example 24.1.13 showed that we can construct two irreducible representations for any S n , 
the symmetric and the antisymmetric representations. The reader may verify that these two 
representations are inequivalent. Since the number of inequivalent irreducible representa¬ 
tions is equal to the number of classes in a group, we have all the information needed to 


694 24. GROUP REPRESENTATION THEORY 



^1 1 k 2 

7 ^( 1 ) 

1 1 

r( 2) 

1 一 l 


Table 24.2 Character table for 



1 尺 1 

3 K 2 

2 [3 

r(i) 

1 

1 

1 

T (2) 

1 

-1 

1 

J ⑶ 

9 

4 

9 

4 

? 

■ 


Table 24.3 Partially filled character table for ^ 3 . 


construct the character table for Sj ，TUble 24.2 shows this character table. We want to use 
the ^2 character table to construct the character table for ^ 3 . With our knowledge of the 
symmetric and the antisymmetric representations, we can partially fill in the ^3 character 
table. Let = (l 3 ), K 2 = (2,1), and = (3) and note that c\ = 1 ,C 2 = 3, and C 3 = 2. 
Then we obtain Table 24.3. To complete the table, we start with /c = 1, and write the RHS 
of Equation (24.35) as 




6 n - .(1) 


2c 


: TM 



because = 1 for the two classes of S 2 , The sum on the RHS is over ^-classes that are 
inside the ?th S 3 -class. For i = 1, only the the first S 2 -class contributes. Noting that 
are the entries of Table 24.2, we get 



Similarly, 

少 2 = — 4 ) = -■1 = 1 and 
C 2 3 



0. 


The second equation follows from the fact that there are no classes of 办 inside the third 
class of ^ 3 . Equation (24.36) now gives 


\0/ a=l 


We can find the number of times occurs in this compound character by taking the 
inner product: 

(x (1) |^)= =6mi. 




24.6 IRREDUCIBLE BASIS FUNCTIONS 695 




3 k 2 2 k 3 

7 1 ⑴ 

1 

1 1 

^(2) 

1 

_1 1 

了⑶ 

2 

0 —1 


Table 24.4 Complete character table for S 3 . 


But 

r 

(X(” I 少 ) =x/1) 你 = l*l-3 + 3- l- l + 2- l- 0 = 6. 
i=l 

These two equations show that mi = 1. So, 

( 1 ) = ( 1 ) +m 2 |x {2) ) + W 3 |x (3) J- 


Subtracting the column vectors，we get a new character: 



=m 2 |x (2) ) + m 3 |x {3) ). 


Taking the inner product with | x ( 2 ) 〉 yields m 2 = 0. It follows that \^ r ) is a simple character. 
In fact, 

= 1 • 2 2 + 3 ■ 0 2 + 2. (—1 ) 2 = 6 ， 


and the criterion of irreducibility, Equation (24.17)，is satisfied. 

We can now finish up Table 24.3 to obtain Table 24.4, which is the complete character 
table for ^ 3 . _ 


24.6 Irreducible Basis Functions 

We have studied the operators T g and their characters representing group elements 
in rather extensive detail. Let us now turn our attention to the carrier space itself. 
In particular，we want to concentrate on the basis functions of the irreducible 
representations. We choose “functions，” rather than vectors, because of their use 
in quantum mechanics as discussed at the beginning of this chapter. 

Let{| be a set of basis functions for W ⑻， the ath invariant irreducible 

subspace. Invariance of W ⑷ implies that 

n a 


6 明 24. GROUP REPRESENTATION THEORY 


functions belonging 
to the /th row of the 
ath irreducible 
representation 


where Tjf\g) are elements of the matrix T^) representing g e G. 

24.6丄 Definition. A function (or vector) \^ a ^) is said to belong to the ith row 
of the ath irreducible representation (or to transform according to the ith row 
of the ath irreducible representation) if there exists a basis {| 少 产)}}:•二 ！ of the ath 
irreducible representation ofG with matrices {T^ (g)) and —1 otherfunctions 
{\^>^)} such that 

L 1#)> = S # (川 O. (24.37) 

j=i 

Functions that belong to rows of irreducible representations have some re¬ 
markable properties. Let |少/°^ and |^>^) transform according to the ith and jth 
rows of the ath and fith irreducible representations，respectively. Choose an inner 
product for the carrier space such that all representations are unitary. Then we have 

(謂 

/=；1 m=l 

Summing this equation over g yields \G\ { 少 / a ) for the LHS, while 

(I G |/«o! 8{ m 5,'j 

n a f - 、 - s 

^ S =EEE f )* ⑻切⑻ 《) Wf )) 

1=1 m=l g^G 

= pM" S ( 少严 10 ， 

na i=i 

where we have made use of Equation (24.8). Therefore, 

… (24.38) 

This shows that functions belonging to different irreducible representations are 
orthogonal. We should expect this, because in our construction of invariant irre¬ 
ducible subspaces, we kept dividing the whole space into orthogonal complements. 
What is surprising is thatfunctions transforming according to different rows of an 
irreducible representation are orthogonal. We had no control over this property! 
It is a consequence of Equation (24.37). Another surprise is the independence of 
the inner product from i : If we let i = j and a = ^ on both sides of (24.38), we 
obtain 

1 

(作 ，+!>/，#))， 

n<x i=i 


(24.39) 




24.6 IRREDUCIBLE BASIS FUNCTIONS 697 


symmetry and the 
quantum mechanical 
perturbation theory; 
lifting of degeneracy 


which indicates that the inner product on the LHS is independent of/. 

24.6.2. Example* The quantum-mechanical perturbation theory starts with a known 
Hamiltonian Hq with eigenvalues Ei and the corresponding eigenstates \Ei). Subsequently, 
a (small) perturbing “potential” V is added to the Hamiltonian, and the eigenvalues and 
eigenstates of the new Hamiltonian H = Ho + V are sought. One can draw important 
conclusions about the eigenvalues and eigenstates of the total Hamiltonian by symmetry 
arguments. 

Suppose the symmetry group of is G，and that of H is 丑 ， which has to be a subgroup 
of G. In most cases, the eigenspaces of Hq are irreducible carrier spaces of G, i.e.，their 
basis vectors transform according to the rows of irreducible representations of G. If is a 
proper subgroup of G, then the eigenspaces of Hq will split according to Equation (24.31). 
We say that some of the degeneracy is lifted because of the perturbation V. The nature of 
the split, i.e” the number and the dimensionality of the vector spaces into which a given 
eigenspace splits, can be obtained by the characters of G and 丑 and Equation (24.32). The 
original eigenspaces are represented on an energy diagram with a line corresponding to 
each eigenspace. The split of the eigenspace into k new subspaces is then indicated by the 
branching of the old line into new lines. 

To the lowest approximation 一 first-order perturbation theory — the magnitude of the 
split, i.e., the difference between the eigenvalues of Hq and those of H, is given by [see 
Equation (21.57)] the expectation value (O |0) a )}，where |0 belongs to the ith 

row of the ath irreducible representation, and |^ a ^) to its jth row (f 一 _/)• Only if this 
expectation value is nonzero will a split occur. This, in turn, depends on the symmetry of 
V: If V is at least as symmetric as Hq (corresponding to G : = ： H),then(^ a) |v|0j a) > = O, 
and no splitting occurs (Problem 24.17). If, on the other hand, V is less symmetric than 
Hq (corresponding to if c G)，then V |0^) will not belong to the jth row of the ath 

irreducible representation, and in general, ( 舍产 ） | V |0^) ^ 0. @ 

We had decomposed the carrier space V of a representation into invariant 
irreducible subspaces The argument above shows that each W( a ) has a basis 
consisting of the “rows” of the irreducible representations. Corresponding to such 
a basis, there is a set of projection operators P^) with the property i = 1 
(Chapter 4). Our aim is to find an expression for these operators, which have the 
defining property Pp) We start with Equation (24.37), multiply 

both sides of it by 7^)*(g), sum over g € G, and use Equation (24.8) to obtain 

n a _ 

g^G j=l geG 

= — £ 1^) = ― I 

Let = a, m = / = i, and multiply both sides by n a /\G\. Then this equation 
becomes 

高！：疗)*(抓1炉)=1炉)，， 



6 的 24. GROUP REPRESENTATION THEORY 


projection 叩 erator 
onto the/th row of 
the ofth irreducible 
representation 


projection operator 
onto the otth 
irreducible 
representation 


which suggests the identification 

= 疔 )* (抓 (24-40) 

|G| ^G 

with the properties 

P- ar) } ) = \Oij8 邰， p[ a) w = 1 們， (24.41) 

where |0 is the projection of |<^> along the ith row of the aih irreducible 
representation. 

We are also interested in the projection operator that projects onto the irre¬ 
ducible subspace W ⑷, Such an operator is obtained by summing P 尸) over i. We 
thus obtain 

M _ fig 17 _ 

P ⑻ = l •广⑻ 产 )*( 抓 (24.42) 

11 gQG 1 = 1 1 1 作 G 

s V ^ 

=X (tf) *U) 

and 

P ⑻ \ 吩 ( 阶 ) = \^)8 a ^ P (a) \<f>) = |_)> ， （24,43) 

where is the projection of |0> onto the ath irreducible invariant subspace. 
These formulas are extremely useful in identifying the irreducible subspaces of a 
given carrier space: Start with a basis {]«*)} of the carrier space，apply P ⑻ to all 
basis vectors，and collect all the linearly independent vectors of the form P( a ) \ai). 
These vectors form a basis of the ofth irreducible : representation. The following 
example illustrates this point. 

24.6.3. Example. Consider the representation of 1 S 3 given in Example 24.1.8, where the 
carrier space is the span of the three functions |^) = xy, 1 ^ 2 ) = and | 少 3 ) = xz. 

We refer to the character table for S 3 (Table 24.4) and use Equation (24.42) to obtain 

P ⑴ =^(TiH-T2+T 3 +T4 + T5+T 6 ) i 

P(2) = 1( T 1 一 丁2 - 丁3 — 1*4 十 丁5 + T 6 ), 

D 

P (3) = ^(2Ti-T 5 ^T 6 ), 

where，as in Example 24.1.8, we have used the notation Tj for ， and the result = « 2 = 1 
and «3 = 2 obtained from Equation (24.18)，Theorem 24.4.4, and the fact that 53 is 
nonabelian. 

To get the first irreducible subspace of this representation, we apply 户⑴ to |^i). Since 
this subspace is one-dimensional, the procedure will give a basis for it if the vector so 




24.7 TENSOR PRODUCT OF REPRESENTATIONS 


obtained is nonzero: 

P (1) l^l) = g(Ti +T 2 +T 3 +T 4 +T 5 +T 6 ) I 叭） 

=^(l^l) + l^l) + l^2> + l^3> + 1^3) + |^2» = ^(1 少 1> + I 少 2> + l^3»- 

O J 

This is a basis for the carrier space of the irreducible identity representation. 

For the second irreducible representation, we get 

P ( 2 ) I 少 1 卜 |( l 少 1> — 1少1) — 钟 2》 _ I 少 3) + I 少3》 + l ^2» = 0. 
o 

Similarly, P ⑵ | 少 2 > = 0 and P( 2 ) |少 3 ) = ◦•means that r ⑵ is not included in the 
representation we are working with. We should have expected this, because if this one¬ 
dimensional irreducible representation were included, it would force the last irreducible 
representation to be one-dimensional as well [see Equation (24.18)], and, by Theorem 
24.4.4, the group 53 to be abelian! 

The last irreducible representation is obtained similarly. We have 

P (3) l^l) = |(2 Ti ^ T 5 - T 6 ) |^i) = ^(2 |^i) - |^3> - |^2 »， 
p ⑶ I 少 2 > = ^(2Ti -T 5 -Tg) |^2) = |(2 I 少 2 〉 — I 少 1 > 一 1^3»- 

These two vectors are linearly independent. Therefore, they form a basis for the last ir¬ 
reducible representation. The reader may check that P( 3 ) 1 ^ 3 ) is a linear combination of 
p ⑶ \\jri) and P ⑶ 1 少 2 ). 题 

24.7 Tensor Product of Representations 

A simple quantum mechanical system possessing a group of symmetry is described 
by vectors that transform irreducibly (or according to a row of an irreducible 
representation). For example, a rotationally invariant system can be described by 
an eigenstate of angular momentum, the generator of rotation. 5 These eigenstates 
transform as rows of irreducible representations of the rotation group. At a more 
fundamental level, the very concept of a particle or field is thought of as states 
that transform irreducibly under the fundamental group of spacetime, the Poincare 
group. 

Often these irreducible states are “combined” to form new states. For example, 
the state of two (noninteracting) particles is described by a two-particle state, 
labeled by the combined eigenvalues of the two sets of operators that describe 
each particle separately. In the case of angular momentum, the single-particle 
states may be labeled as \U,mi) for i = 1, 2. Then the combined state will be 
labeled as |Zi, m\\ h, m 2 ), and one can define an action of the rotation group on 

5 Chapter 27 will make explicit the connection between groups and their generators. 


700 24 . GROUP REPRESENTATION THEORY 


Kroneckerproduct 
representation 


character of a 
product 
representation is a 
product of characters 


the vector space spanned by these combined states to construct a representation. 
We now describe the way in which this is done. 

Suppose that r : G GL(V) and S •• G — GL(W) are two representations 
of a group G, Let V <S> W be the tensor product of V and W (see Example 1.3.19). 
Now define an action of the group G on V 0 W via the representation T ® S : 
G-^GL(V<g)W)givenby 

T® S(g)(\v ), \w)) = (T(g) |v), S(g) I 琳 

We note that 

T ® S(g\g 2 )(\v ), \w)) 

=(T(g\g2) |V>, S(gig2) |w» = (T(gi)T(g 2 ) |w>, S(g\)S(g2) |w)) 

=T 5(g2>!^)) = T ^S(g\)T ^S(g2)(\v) , \w)). 

It follows that T^Sis indeed a representation, called the tensor product or direct 
product or Kronecker product representation. It is common, especially in the 
physics literature, to write \v, w),or simply \vw) for (|u> ， [u;)), and TS for 
If we choose the orthonormal bases {|v/}} for V and for W, and define an 
inner product on V ⑭ W by 

(l；, u;| v f , w f ) = (v| v f ) {w\ w f ), 

we obtain a matrix representation of the group with matrix elements given by 

• * 

(T ® S)iajb(g) = {Vi, W a \T ^ S(g)\Vj, Wb) 

=(vi\T(g)\vj) {w a \ S(g)\w b ) = Tij(g)S ab (g). 

Note that the rows and columns of this matrix are distinguished by double indices. 
If the matrix T ism xm and S is « x n, then the matrix T ⑭ S is (mn) x (mn). 
The character of the tensor product representation is 

X T0S (g) = S ^ia(s) = E Ti^Saaig) = ^ Afe) E 

i y a i,a i a 

=X T (g)x S (g) ^ = (24.44) 

So the character of the tensor product is the product of the individual characters. 

An important special case is the tensor product of a representation with itself. 
For such a representation, the matrix elements satisfy the symmetry relation (T <g) 
^)iajb(g) = (T T)ai^bj(g)- This symmetry can be used to decompose the 
tensor product space into two subspaces that are separately invariant under the 
action of the group. To do this, take the span of all the symmetric vectors of the 
form (|V|iy；) + \vjW()) € V ® V and denote it by (V (S> V) s . Similarly, take the 
span of all the antisymmetric vectors of the form (|u/wy〉 一 \vjwi)) and 

denote it by (V <g) V) a . Next note that 

\ViWj) = j(\ViWj) + \VjWi)) + ^(\ViWj) - \VjWi)). 



24.7 TENSOR PRODUCT OF REPRESENTATIONS 701 


It follows that every vector of the product space can be written as the sum of a 
symmetric and an antisymmetric vector. Furthermore, the only vector that is both 
symmetric and antisymmetric is the zero vector. Therefore, 

Now consider the action of the group on each of these subspaces separately. 
From the relation 


^ 0 T(g) \viWj) = T <S> r(g)(|Vf}, |wj)) 

騎 )1 ⑻， 

^ k i • - 


^T ki Tij(g)(g) (\vk), \wi)) = ® T) k ijj(g) \v k wi) 

kj U 


we obtain 

T ^T(g)(\viWj)±\vjWi)) 

= f [(r (g) T) mj (g) 士 （r ⑭ T) klJi (g)] \v k wi ). 

Ki (24.45) 


Kroneckerproduct 
reduces to the 
symmetric and the 
antisymmetric 
representations 


Problem 24.21 shows that the RHS can be written as a sum over the symmetric 
(for the plus sign) or antisymmetric (for the minus sign) vectors alone. It follows 
that 


24.7.1. Box* The Kronecker product of a representation with itself is al¬ 
ways reducible into two representations，the symmetrized product and the 
antisymmetrized product representations. 


24.7.1 Clebsch-Gordan Decomposition 

A common situation in quantum mechanics is to combine two simple systems 
into a composite system and see which properties of the original simple systems 
the composite system retains. For example, combining the angular momenta of 
two particles gives a new total angular momentum operator. The question of what 
single-particle angular momentum states are included in the states of the total an¬ 
gular momentum operator is the content of selection rules and is of great physical 
interest: A quark and an antiquark (two fermions) with spin \ always combine 
to form a meson (a boson), because the resulting composite state has no projec¬ 
tion onto the subspace spanned by half-integer-spin particles. In this section, we 
study the mathematical foundation of this situation. The tensor product of two 
irreducible representations and of G is denoted by 7 T(ofx ^\ and it is. 



702 24. GROUP REPRESENTATION THEORY 


simply reducible 
group 


in general, a reducible representation. The characters, generally compound, are 
denoted by x( ax 灼- Equation (24.14), combined with Equation (24.44), tells us 
what irreducible representations are present in the tensor product, and therefore 
onto which irreducible representations the product representation has nonzero pro¬ 
jection: 




(a 邱） — (a) (fi) 


xr=j：< 


也 (o0 


where are nonnegative integers. We rewrite this more conveniently in terms 
of vectors as 

\X (aX ^) = iZmf\ X (cT) ) i 


m 




|G| 




r >),» ，，⑻ 


(24.46) 


A group for which — 0,1 is called simply reducible. 


Rudolph Friedrich Alfred Clebsch (1833-1872) studied math¬ 
ematics in the shadow of Jacobi at the University of Konigsberg, 
two of his teachers having been students of Jacobi. After gradua¬ 
tion he held a number of positions in Germany, including positions 
at the universities of Berlin, Giessen, and finally Gottingen, where 
he remained until his death. He and Carl Neumann, son of one of 
tbe aforementioned Jacobian teachers, founded the Mathematische 
Annalen. 

Clebsch began his career in mathematical physics, producing a 
doctoral thesis on hydrodynamics and a book on elasticity in which 
he treated the elastic vibrations of rods and plates. These works were 
primarily mathematical, however, and he soon turned his attention more to pure mathemat¬ 
ics. His links to Jacobi gave rise to his first work in that vein, concerning problems in 
variational calculus and partial differential equations, in which he surpassed the results of 
Jacobi’s work. 

Clebsch first achieved significant recognition for his work in projective invariants and 
algebraic geometry. He was intrigued by the interplay between algebra and geometry, and, 
since many results in the theory of invariants have geometric interpretations, the two fields 
seemed natural choices. 



24.7*2* Example. Referring to Table 24.8 of Problem 24.15, and using Equation (24.44), 
we can construct the compound character lx ( 4x5 ) > with components 9, 一 1,1 ， 0, — 1. Then, 




24.7 TENSOR PRODUCT OF REPRESENTATIONS 703 


Clebsch-Gordan 

series 


Howto obtain 
invariants from the 
product of 
representations 


we have 


k (4x5) > 


9 


0 


rV 




a ： 


45 -^{x (ff) U (4x5) ). 


For the first irreducible representation, we get 

4 = ^%( 4 ，=达乂 (1) 、 ( 帥 


== — [1 • 1 • 9 + 6• 1, (—1) + 3*1-1+8-1-0 + 6-1 - (― 1)] = 0, 

For the second irreducible representation, we get 

-f = ^(x (2) |x (4x5 >) 

= 24 [1 * 1 * 9 + 6 ■ (—1) ■ (一 1) + 3 • 1 ■ 1 + 8 »(— 1) _ 0 + 6 ■ (—1) ■ (—1)] = 1. 

Similarly, = 1, = 1, and = 1. We thus see that the identity representation is 

not included in the direct product of irreducible representations 4 and 5; all other irreducible 
representations of 54 occur once in r( 4x5 ) • 圔 

In terms of representations themselves, we have the so-called Clebsch- 
Gordan series 




m 




a- 


\o\i 




(24.47) 


where we have used Equation (24.13) 

24.7*3. Example. The one-dimensional identity representation plays a special role in the 
application of group theory to physics because any vector (function) in its carrier space is 
invariant under the action of the group, and invariant vectors often describe special states of 
the quantum mechanical systems. For example, the ground state of an atomic system with 
rotational invariance has zero orbital angular momentum, corresponding to a spherically 
symmetric state. 

Another example comes from particle physics. Quarks are usually placed in the states 
of an irreducible representation of a group [5t/(n), where n is the number of ‘^flavors” 
such as up, down, charm], and antiquarks in its adjoint. A question of great importance 
is what combination of quarks and antiquarks leads to particles — called singlets~that are 
an invariant of the group. For the case of quark-antiquark combination, the answer comes 
in the analysis of the tensor product of one irreducible representation, say r( a )，and one 
adjoint representation, say In fact, using Equation (24.47), we have 


m 








\G\i 


\G\i 



704 24. GROUP REPRESENTATION THEORY 


where we used Equation (24.13) and the fact that all characters of the identity representation 
are unity. Thus to construct an invariant state, we need to combine a representation with its 
adjoint, in which case we obtain the identity representation only once. 圜 




Paul Albert Gordan (1837-1912), the son of David Gordan, 
a merchant, attended gymnasium and business school, then 
worked for several years in banks. His early interest in math- 被泛’ 於嫩徽織 

ematics was encouraged by the private tutoring he received 
from a professor at the Friedrich Wilhelm Gymnasium. He at¬ 
tended Ernst Kummer’s lectures in number theory at the Univer¬ 
sity of Berlin in 1855, then studied at the universities of Bres¬ 
lau, Konigsberg, and Berlin. At Konigsberg he came under the 
influence of Karl Jacobi’s school, and at Berlin his interest in al¬ 
gebraic equations was aroused. His dissertation (1862), which - 

concerned geodesics on spheroids, received a prize offered by the philosophy faculty of the 
University of Breslau. The techniques that Gordan employed in it were those of Lagrange 
and Jacobi. 

Gordan’s interest in function theory led him to visit G. F. B. Riemann in Gottingen 
in 1862, but Riemann was ailing, and their association was brief. The following year, 
Gordan was invited to Giessen by Clebsch, thus beginning the fruitful collaboration most 
physicists recognize. Together they produced work on the theory of Abelian functions, 
based on Riemann's fundamental paper on that topic, and several of Clebsch’s papers are 
considered important steps toward establishing for Riemann^ theories a firm foundation 
in terms of pure algebraic geometry. Of course, the Clebsch-Gordan collaboration also 
produced the famous coefficients that bear their names, so indispensable to the theory of 
angular momentum coupling found in almost every area of modem physics. In 1874 Gordan 
became a professor at Erlangen, where he remained until his retirement in 1910. He married 
Sophie Deuer, the daughter of a Giessen professor of Roman law, in 1869. In 1868 Clebsch 
introduced Gordan to the theory of invariants, which originated in an observation of George 
Boole’s in 1841 and was further developed by Arthur Cayley in 1846. Following the work of 
these two Englishmen, a German branch of the theory was developed by S. H. Aronhold and 
Clebsch, the latter elaborating the former’s symbolic methods of characterizing algebraic 
forms and their invariants. Invariant theory was Gordan’s main interest for the rest of his 
mathematical career; he became known as the greatest expert in the field, developing many 
techniques for representing and generating forms and their invariants. 

Gordan made important contributions to algebra and solutions of algebraic equations ， 
and gave simplified proofs of the transcendence of e and n. The overall style of Gordan's 
mathematical work was algorithmic. He shied away from presenting his ideas in informal 
literary forms. He derived his results computationally, working directly toward the desired 
goal without offering explanations of the concepts that motivated his work. 

Gordan’s only doctoral student, Emmy Noether, was one of the first women to receive 
a doctorate in Germany. She carried on his work in invariant theory for a while, but under 
the stimulus of Hilbert’s school at Gottingen her interests shifted and she became one of the 
primary contributors to modem algebra. 












24.7 TENSOR PRODUCT OF REPRESENTATIONS 705 


So far, we have concentrated on the reduction of the operators and carrier 
、 spaces into irreducible components. Let us now direct our attention to the vec¬ 
tors themselves. Given two irreducible representations and T ⑼ with carrier 

spaces spanned by vectors {and{| ^^)}^ l5 we form the direct product 

representation r( ax #) with the carrier space spanned by vectors We 

know that r( ax 芦 ） is reducible, and Equation (24.47) tells us how many times each 
irreducible factor occurs in r( ax 卢 ). This means that the span of {|< a ) ^j^)} can 
be decomposed into invariant irreducible subspaces; i.e” there must exist a basis of 
the carrier of the product space the vectors of which belong to irreducible represen- 
Clebsch-Gordan tations of G. More specifically, we should be able to form the linear combinations 
coefficients 

= L C ⑽; Ml "; 幻 ， (24.48) 

which transform according to the rows of the ath irreducible representation. Here 
the subscript k refers to the row of the crth representation, and q distinguishes 
among functions that have the same a and k, corresponding to the case where 
trif > 2. For simply reducible groups, the label q is unnecessary. The coefficients 
C(a^; (r,q\ij\ k) are called the Clebsch-Gordan coefficients for G. These coef¬ 
ficients are normalized such that 

g q\ij ； k)C(afi ； k f ) = U 押 '％'， 

ij 

^ C*(a^ ； (T, q\ij; k)C(ap; a, q^f; k) - SwSjjf. 

aqk 

This will guarantee that are orthonormal if the product vectors form an 

orthonormal set. Using these relations, we can write the inverse of Equation (24.48) 
as 

= E C\a^a,q\iy,k) (24.49) 

<rqk 


24.7.2 Irreducible Tensor Operators 

An operator A acting in the carrier space of the representation of a group G is 
transformed into another operator, A T g AT J 1 , by the action of the group. Just as 
in the case of vector spaces, one can thus construct a set of operators that transform 
among themselves by such action and lump these operators in irreducible sets. 

irreducible set of 24.7.4. Definition. An operator is said to to transform according to the ith 
operators row of the ath irreducible representation if there exist — 1 other operators 


706 24. GROUP REPRESENTATION THEORY 


scalar operator 


Wigner-Eckart 
theorem and reduced 
matrix elements 


{} and a basis {| such that 

TgA^T- 1 = J (24.50) 

7=1 

where (rjf\g)) is the matrix representation of g. The set of such operators is 
called an irreducible set of operators (or irreducible tensorial set). 

In particular, if T^\g) = 8 小 i.e., if the representation is the identity repre¬ 
sentation, then A = W 1 ， and A is called a scalar operator. The term “scalar” 
refers to the fact that A has only one “component,” in contrast to the other operators 
of Equation (24.50)，which may possess several components. 

Consider the set of vectors (functions) defined by |^> = A; a ) where 

transform according to the ^th irreducible representation. These vectors 
txansform according to 


n a fifi 

v 細 =2 r/X g T w (g) |^>) 

k=\ 1=1 

=e =e 歡)(幻 _ ， 

k，i kj 


(24.51) 


i.e.，according to the representation This means that the vectors [^ ; *) 

have the same transformation properties as the tensor product vectors 
Therefore, using Equation (24.49), we can write 

A / 0) = E C* ⑽ 'cr ， q\ij;k) 卜广)， 

(rqk 


and more importantly, 

(4r)|A 严 KU C*(ap；c 9 q\ir 9 k) (^\^) 

aqk ^ 

use Eq. (24.38) here 


(24.52) 


It follows that the matrix element of the operator Ap) will vanish unless the 
irreducible representation occurs in the reduction of the tensor product 
(gf T^\ and this can be decided from the character tables and the Clebsch- 
Gordan series, Equation (24.47). 

There is another remarkable property of Equation (24.52) that has significant 
physical consequences. Notice how the dependence on i and j is contained en¬ 
tirely in the Clebsch-Gordan coefficients. From Equation (24.39) it follows that 




24.8 REPRESENTATIONS OF THE SYMMETRIC GROUP 707 


( 必 ) is independent of m. Therefore, this dependence must also be con¬ 
tained entirely in Clebsch-Gordan coefficients. One therefore writes (24.52) as 


喊 )I A?) |0 乙 C* ⑽; K ， #7; m) 〈 0( 叫 | A ⑻ ， V 




reduced matrix element 


(24.53) 


This equation is known as the Wigner - Eckart theorem, and the numbers mul¬ 
tiplying the Clebsch-Gordan coefficients are known as the reduced matrix ele¬ 
ments. 

From the point of view of physics, Equation (24.53) can be very useful in 
calculating matrix elements (expectation values and transition between states), 
once we know the transformation properties of the physical operator. For example, 
for a scalar operator S, which, by definition, transforms according to the identity 
representation, (24,53) becomes 


(此 )| A |0 #)|| A( a ) II,) 〉 8 y ^8 mj ; 

i.e_，scalar operators have no matrix elements between different irreducible repre¬ 
sentations of a group, and within an irreducible representation, they are multiples 
of the identity matrix. This result is also a consequence of Schur’s lemma. 


24.8 Representations of the Symmetric Group 

The symmetric (permutation) group is an important prototype of finite groups. In 
fact ， Cayley’s theorem (see [Rotm 84, p. 46] for a proof) states that any finite 
group of order n is isomorphic to a subgroup of S n . Moreover, the representation 
of S n leads directly to the representation of many of the Lie groups encountered 
in physical applications. It is, therefore worthwhile to devote some time to the 
analysis of the representations of S n . 


24.8.1 Analytic Construction 

The starting point of the construction of representations of the symmetric group is 
Equation (24.35), which is valid for any finite group. There is one simple character 
that every group has, namely, the character of the one-dimensional symmetric 
representation in which all elements of the group are mapped to 1 g R. Setting 
^ = 1 in (24.35), and noting that Ylj = 必 ， we obtain 




\G\dj 


(24.54) 


where {irf 1 } are the components of a compound character of G. 

Frobenius has shown that by a clever choice of H t one can completely solve the 
problem of the construction of the irreducible representations of S n . He proceeded 



708 24. GROUP REPRESENTATION THEORY 


as follows. Consider a partition (X) == ( 人 1 ， … ， 久 „) of w_ The symmetric groups 
{S^} are subgroups of S n and have no elements in common~therefore they all 
commute with one another. The direct product of these subgroups is a subgroup 
of S n ，which we denote by S ( 入)： 

x … x 

If we denote the compound character of Equation (24.54) by in this case, 
calculate 1 孖 | ， c, ， and 必 ， and substitute the results in (24.54) — all of which can be 
done in closed form~we obtain an explicit formula for the components of IVrW). 
This formula is messy, and we shall not derive it here. The interested reader may 
refer to [Hame 89, pp. 189-192] for details. 

We are really interested in the simple characters of S n , and Frobenius came 
up with a powerful method of calculating them. Since there is a one-to-one cor¬ 
respondence between the irreducible representations and conjugacy classes, and 
another one between conjugacy classes of S n and partitions of /x，we shall label the 
simple characters of S n by partitions of n. Thus, instead of our common notation 
we use x 岔 )， where (k) denotes a partition of n 9 and (0 a cycle structure of 
Sn' 

Suppose we want to find the irreducible characters corresponding to the cycle 
structure (0 = (l a ， 2 於， 3 y ，…) .These form a column under the class (Z) in a 
character table. To calculate the irreducible characters, form two polynomials in 
Oi ， 尤 2 , .•” 知 ） as follows. The first one, which is completely symmetric in all 
variables, is 


S (D 




Y 


(24.55) 


The second one is completely antisymmetric, and can be written as 


D(xi,= det 


xi 

2 


X 


I v «-l r n~ 


XI 

4 


x n 


n ⑶一 


l<] 


(24.56) 


7t 


It can be shown that the simple characters of S n are coefficients of certain terms 
of the product of these polynomials. To be exact, we have 


S(i)D(xi . x n ) 


4:(11 一 1 ) x 7((n) 


(>■) 


JT 


(24.57) 


24.8 REPRESENTATIONS OF THE SYMMETRIC GROUP 7 的 


The outer sum goes over all partitions of n, the inner sum over all permutations 
of S n . The procedure for finding the simple characters of S n should now be clear 
from (24.57): 


24«8.1. Box. To find the simple character construct the corre¬ 

sponding symmetric and antisymmetric polynomials, multiply them together, 
collect all terms of the form 

入 l +«—1 入 2+” 一 2 A .„ 

X 7t{l) X n{2) … ^(rt-1) X 7t(n) 

for all permutations n G S n ; the coefficient of such a term will be the desired 
character. 

Since the coefficients of …^(^)^(») ^ or various jt’s differ 

by a sign, in practice, the coefficient of just 1 … x^T^Xn 1 is precisely 

the simple character we are looking for. 

24.8.2. Example. The best way to understand the procedure described above is to go 
through an example in detail. We calculate the characters of S 3 using the above method. 
Label the rows of the character table with the partitions of 3. These are (3), (2,1), and 
(1， 1， 1). Similarly, label the columns with the conjugacy classes，or cycle structures: (l 3 )， 
(1,2), and (3), The first cycle structure has a = 3, p = 0 = y. Therefore, 

s (l 3 ) = ( X 1 +^2 + 文 3 ) 3 — x \-\- X 2 +^3 



(24.58) 
and 

D(xi ， X2 ， x^) — (^1 - X2)(x\ - Xj)(x2 - 文 3)， 

so that 

s^D(xi, X2, X3) = s^{x\x2 - - x^x\ + - x^-h x^xi). 

(24.59) 

Now we note that for (A.)= ⑶，久 1 = 3,入 2 = 0,邱 d 入 3 = 0_ Therefore, the 
coefficient of 卜 …々 = xfx 2 gives x^. Similarly, for (X) = (2,1,0), 

入 1 = 2 , A ，2 = 1 , and A .3 = 0 , and the coefficient of . .^ n = x\x\ 

gives • Finally, for (k) = (1，1,1)，A，i = ^2 = ^3 == U and the coefficient of 

X 2 2 +n ~ 2 = x\x\x^ gives 亞;’ 1 ). These coefficients can be read off by 
scanning through Equation (24.58) while multiplying its terms by those of Equation (24.59) 
and keeping track of the coefficients of the products of the relevant powers of x\, Jt 2 , and 
x^. The reader may verify that there is only one term of the form ^^ 2 , whose coefficient is 



710 24. GROUP REPRESENTATION THEORY 



(l 3 ) 

(1 ， 2) 

⑶ 

⑽ _ 

1 

1 

1 

y(2,l) 

2 

0 

-1 


1 

-1 

1 


Table 24.5 The character table for ^ 3 . Each column corresponds to a conjugacy class, 
each row to a partition of 3. The last two rows have been switched compared to Table 24.4. 


1, giving = 1 ； there are two terms of the form xfx 妾 ， whose coefficients are -1 and 
3 ? giving = 2; and there are four terms of the form x^x^x^, whose coefficients are 
+1, 一 3 , -3, and +6, giving '(^) 1 ， 1 ) = 1. Therefore, the first column of the character table 

of S 3 is ^ 2 ^. To obtain the second column, we consider the second conjugacy class, (1,2 )， 
with a = l = /Sandy=0. The corresponding symmetric polynomial is 

^(1,2) = ( x l + 义 2 + ^3)(^1 + ^2 + x b 、、 

=x^-\-xl J rx\-\- x}x 2 - x\x^ + ^ 1^2 + X 2 x^-h xix^ + X 2 x\. 

(24.60) 

D(x\,xi ， xi) is the same as before. Multiplying and keeping track of the coefficients of 
x\% 2 , and xfx^xj, we obtain X(f f 2 ) " ^(1,2) = 0, = - 1 - 11 follows 

that the second column of the character table of S 3 is ( . 

The last column is obtained similarly. We note that a = 0 = p, and y = 1. Therefore, 
the symmetric polynomial is 

5(3) -x\-\-x\+x\, 

and the antisymmetric polynomial is the same as before. Multiplying these two polynomials 
and extracting the coefficients as before，we get X 當 = 1 ， = —1，and xg)’ 1 ’ 1 ) = 1- 

It follows that the third column of the character table of ^3 is ( )• 

Collecting all the data obtained above, we can reconstruct the character table of 53 . This 
is shown in Table 24.5. The irreducible representations are labeled by the three possible 
partitions of 3, and the conjugacy classes by the three cycle structures. ■ 


24.8.2 Graphical Construction 

The analytic construction of the previous subsection can be handled using graphical 
techniques that are considerably simpler. To begin with, let us find the character of 
the identity element of S n . The cycle structure is (l w ) ， i.e., all cycles consist of a 
single element. Thus, a = n, and /3, y t etc. are all zero. It follows that the LHS of 
Equation (24.57) is We calculate this product one power of 


24.8 REPRESENTATIONS OF THE SYMMETRIC GROUP 711 


at a time. For the same reason as in the example above, x 浮 ) will be the coefficient 
of 

入 l+n-l A2+n-2 X n _i+1 

• 


Ferdinand Georg Frobenlus (1849-1917), the son of a par¬ 
son, was bom in Berlin and began his mathematical studies at 
Gottingen ini 867. He received his doctorate in Berlin three years 
later. Four years later, on the basis of his mathematical research, 
he was appointed assistant professor at the University of Berlin. 

He achieved the rank of full professor at the Eidgenossische 
Polytechnikum Zurich before returning to Berlin as a professor 
of mathematics in 1892. During the early years of Frobenius’s 
career, modem group theory was in its infancy. He combined 
its three main branches of study 一 the theory of solutions to al¬ 
gebraic equations (permutation groups and the work of Galois), 
geometry (transformation and Lie groups), and number theory — to produce tbe concept of 
the abstract group. He collaborated with Issai Schur in representation and character theory 
of groups. 

His paper Uber die Gruppencharactere is of fundamental importance. It was presented 
to the Berlin Academy on 16 July 1896 and it contains work that Frobenius had done in the 
preceding few months. In a series of letters to Dedekind, the first on 12 April 1896, his ideas 
on group characters quickly develop, and Frobenius is able to construct a complete set of 
representations by complex numbers. In a letter to Dedekind on 26 April 1896 Frobenius 
finds the irreducible characters for the alternating group, and the symmetric groups. 

In 1897 Frobenius reformulated the work of Molien 一 the Latvian student of Klein, who, 
in his thesis, classified the semisimple algebras using the method of group rings — in terms of 
matrices and then showed that his characters are the traces of the irreducible representations. 
Frobenius’s character theory found important applications in quantum mechanics and was 
used with great effect by Burnside, who wrote it up in the 1911 edition of his Theory of 
Groups of Finite Order. 

Frobenius is also remembered as the originator of a series method for solving ordinary 
differential equations. Despite the clearly greater importance of his work in group theory, 
this method of Frobenius serves admirably to perpetuate his name. 


If we multiply D(xj) by J^Xi one ^ at a time, we increase the power of one 
of the Xj's by one. If at any stage, two of the exponents become equal, the term 
must vanish, due to the antisymmetry of (X] Xi)D(Xj). Therefore, as we raise the 
degree of the polynomial by one at each stage, the power of x\ must be raised at 
least as fast as X2, and the power of X2 must be raised at least as fast as 巧， etc. Our 
goal is to raise the power of xi by k\, that of X2 by 入 2, and, in general, the power 
ofx/ byXi, making sure that at each stage, the number of multiplications by x\ is 
greater than or equal to the number of multiplications by X2, etc. The total number 









712 24. GROUP REPRESENTATION THEORY 


Young frame defined 


of ways by which we can reach this goal will be x 既 ， which is also the dimension 
of the irreducible representation (X) by Equation (24.9). 

To see the argument more clearly, suppose that we are interested in the dimen¬ 
sion of the irreducible representation of 5*4 corresponding to (3,1). Then we must 
raise the power of x\ by 3 and the power of x% by 1; and ^4 will remain intact, 
and therefore will not enter in the following discussion. It follows that D(xj) is 
to be multiplied by x\xi, one x-factor at a time, the number of 太 1 -factors always 
exceeding the number of ^-factors. The possible ways of doing this are 

x\x2, x\x2 X\, x\X2x\. (24.61) 

Note that as we count the factors from left to right, the number of xi’s is always 
greater than or equal to the number of X 2 9 s. Thus X 2 X^ is absent because X 2 oc¬ 
curs without occurring first. It follows that the dimension of the irreducible 
representation (3,1) is 3. 

A graphical way to arrive at the same result is to draw = 3 boxes on top 
and 入 2 = 1 box below it: 


The next step is to fill in the boxes with numbers corresponding to the position of 
x\ (filling up the first row) and 巧 factors (filling up the second row) in Equation 

(24.61) . Since in the first term of (24.61), the : q’s occupy the first, second, and 
third positions, we enter 1, 2, and 3 in the first row, and 4 in the second row 
corresponding to the last position occupied by X 2 - Similarly, in the second term of 

(24.61) , the a：i ’s occupy the first, second, and fourth positions; therefore, we enter 
1 ,2, and 4 in the first row, and 3 in the second row corresponding to the position 
occupied by X 2 . Finally, in the last term of (24.61)，the xi ’s occupy the first, third, 
and fourth positions; therefore, we enter 1 ， 3, and 4 in the first row, and 2 in the 
second row corresponding to the position occupied by X 2 - The result is the graph 
shown below: 


T 

\2\[3] 

T 

1 剛 

1 

3 4 

s 



2 



24 . 8 . 3 . Definition. Let (X) — A. 2 , -.., be a partition of n. The Young 

frame (or the Young pattern) associated with (A) is a collection of rows of boxes 
(squares) aligned at the left such that the first row has boxes, the second row 
入 2 boxes ， etc. Since t 入 ,+ 1 ， the length of the rows decreases as one goes to the 
bottom of the frame. 

The Young frame associated with ( 入 ） represents xk n ，which multiplies 
the antisymmetric polynomial D(xj), To find the dimension of the irreducible 
representation T^\ we have to count the number of ways in which the x-factors 
can be permuted among themselves such that as we scan the product, the number 
of jc/’ s is never less than number of Xj'sif j > /. This leads to 


24.8 REPRESENTATIONS OF THE SYMMETRIC GROUP 713 


standard Young 24.8.4. Definition. A standard Young tableau (or diagram, or graph) is a Young 
tableaux defined frame filled with numbers 1 through n such that 

/• the numbers are placed consecutively left to right on the rows starting with 
1 in the far-left box of the first row; 

2. no box of any row is to be filled unless all boxes to its left are already filled; 

3. at each stage, the number of boxes filled in any row is never less than the 
number of rows below it. 

regular graphs Tableaux satisfying the last condition are called regular graphs. 

It follows that in a Young tableau, the number 1 is always in the upper left-hand 
box, and that going down in a column, the numbers must increase. 

24.8.5. Theorem. Let (X) be a partition ofn. Then the dimension ofthe irreducible 
representation T ⑻ is equal to the number of standard Young tableaux associated 
with (A). 

24.8.6. Example. We wish to calculate the dimension of each irreducible representation 
of S 4 , The partitions are ⑷， （3,1)，（2,2), (2, 1， 1)，and (1， 1，1， 1) whose associated Young 
frames are shown below: 

□ □□□ □□□ □□ □□ □ • 

□ □□ I □匚 

⑷ (3,1) (2,2) (2,14) (1,1,U) 

The number of standard Young tableaux associated with (X) = (4) is 1, because there is 
only one way to place the numbers 1 through 4 in the four boxes. Thus, the dimension of 
r( 4 ) is 1_ For (入） =(3,1), we can place 2 either to the right of 1 or below it. The first 
choice gives rise to two possibilities for the placement of 3: Either to the right of 2 or below 
1. The second choice gives rise to only one possibility for 3, namely to the right of 1. Wiih 
1, 2, and 3 in place, the position of 4 is predetermined. Thus, we have 3 possibilities for 
(入） =(3,1), and the dimension of W 3 ’ 1 ) is 3. For (入 ）=(2,2)，we can place 2 either to 
the right of 1 or below it. Both choices give rise to only one possibility for 3: In the first 
case, 3 can only go under 1; in the second case to its right. With 1，2, and 3 in place, the 
position of 4 is again predetermined. Thus, we have 2 possibilities for (A,) = (2,2), and 
the dimension of r( 3 , !）is 2. The reader may check that the dimension of r( 2 , i，i) is 3, and 
that of 以 1 ， 1 ， i， 1 ) is 1. Figure 24.1 summarizes these findings. We note that the dimensions 
satisfy l 2 + 3 2 + 2 2 + 3 2 + l 2 = 24, the second equation of (24.18). 國 



(2，1，1) 


(1,1,U) 


714 24. GROUP REPRESENTATION THEORY 


⑷ 

1||2||3| 羽 



n (4): 

(3,1) 

1 12 3 

1 剛 

1 酬 

n (3 ， l) 

T 

2_ 

T 


(2,2) 

1 | 2 

刚 


n (2 9 2) 

3| 4 

2 |4 




Figure 24.1 The standard Young tableaux, and the dimensions of irreducible represen¬ 
tations of 1 S 4 . 

24.8»3 Graphical Construction of Characters 

The product of the symmetric polynomial 5(/) and the antisymmetric polynomial 
D(xj) contains all the information regarding the representations of S n - We can 
extract the simple characters by looking at the coefficients of appropriate products 
of the a:- factors，This can also be done graphically. Without going into the combi¬ 
natorics of the derivation of the results, we state the rules for calculating the simple 
characters, and examine one particular case in detail to elucidate the procedure, 
whose statement can be very confusing. 

As before, we label the irreducible representations with the partitions of n. 
However, wc separate out the common factors in a cyclic structure, labeling the 
cycles by Zi ， / 2 , etc. For example, (2, l 2 ) has l\ =2,h = 1, and — 1. So, we 
write (0 as (/ 1 , h, •.., l m ). 

regular application 24.8.7. Definition. A regular application ofr identical symbols to a Young frame 

is the placement of those symbols in the boxes of the frame as follows. Add the 
symbols to any given row, starting with the the first (farthest to the left) unoccupied 
cell，until the symbols are all used 6 or the number exceeds that of the preceding 


3 

lilss 


4 

1[2]L3J 


2 

masmrama 


6 It is understood that if you cannot place all symbols on the first row, then you should start at the second row. 



positive and negative 
applications 


Detailed analysis of 
the construction of 
the character table 
forSj 


24.8 REPRESENTATIONS OFTHE SYMMETRIC GROUP 715 


line by one. In the latter case, go to the preceding line and repeat the procedure, 
making sure that the final result of adding all r symbols will be a regular graph. 
If in this process the r symbols are divided among an even number of rows, we 
speak of negative application. If the number of rows is odd，we have a positive 
application. 

As an illustration, consider the regular application of five 2’s to the blank Young 
frame shown below. 



We cannot start on the first row because it does not have enough boxes for the 
five 2’s. We can start on the second row and put one 2 in the first box. This brings 
the number of 2 ’s in the second row to one more than in the first row; therefore, 
we should now go to the first row and put the rest of the symbols there. We could 
start at the third row, put one 2 in the first box, put a second 2 in the first box of 
the second row, and tiie rest in the first row. Altogether we will have 3 regular 
applications of the five 2’s. These are shown in the diagram below. 


T 

T 

T 

2 

\ 2 \ 

T 



I 


— 


I 




T 


□ 

I 

I 


n 









T 












Of these the first and the last tableaux are negative applications, and the middle 
one is positive. 

24.8.8. Theorem. The character ofthe irreducible representation !T ( 入 ) of the class 
(/) = (/i, • • • > lm) is obtained by successive regular applications ofl\ identical 

symbols (usually taken to be l ， s}，then 1% identical symbols of a different kind 
(usually taken to be 2 f s) t etc. The character xjj) is then equal to the number 
of ways of building positive applications minus the number of ways of building 
negative applications. 

The order in which the li，s are applied is irrelevant. However, it is usually 
convenient to start with the largest cycle. 

The best way to understand the procedure is to construct a character table. Let 
us do this for 54 . As usual, the rows are labeled by the various partitions (X) of 4. We 
choose the order ⑷， （ 3,1) ，（ 2,2) = (2 2 ) ，（ 2,1,1) == (2, l 2 ) ，（ 1 ， 1 ， 1 ， 1) = (l 4 ). 
The columns are labeled by classes (/) in the following order: (l 4 ) ， (2, l 2 ) ，（ 2 2 )， 
(3,1) ，（ 4) ， where, for example, (2, l 2 ) means that l\ = 2, Z 2 = 1， and Z 3 = 1. 
Example 24.8.6 gives us the first column of the character table. Similarly，the first 
row has 1 in all places, because it is the trivial representation. Our task is therefore 



716 24, GROUP REPRESENTATION THEORY 


to fill in the rest of the table one row at a time. The second row, with ( 入 ） =(3,1 )， 
has a Young frame that looks like 


and for each class (column) labeled (/i,..., / m ), we need to fill this in with l\ 
identical symbols (l's), h identical symbols of a different kind (2’s), etc. 

The second column has l\ = 2, /2 ^ 1 = I 3 , So we have two Vs, one 2, and 
one 3. If we start with the first row, the two Ts can be placed in its first two boxes. 
If we start with the second row, the two Vs must be placed vertically on top of 
each other. In the first case, we have two choices for the 2: Either on the first row 
next to the two rs, or on the second line. In the second case, we have only one 
choice for the 2: in the first row next to 1. With Ts and 2 in place, the position of 
3 is determined. The three possibilities are shown below: 


T 

|1| 2| 

— ~—— 

1||3| 

T 

|2 |3| 

T 




T 



The first two are positive applications, the third is negative because the Vs occupy 
an even number of rows. We therefore have 


4 3 S) =+1+1 - 1==+l 

The third column has /1 = 2 = So we have two Ts and two 2’s. We place 
the Ts as before. When the two Ts are placed vertically, we can put the 2’s on 
the first row and we are done. When the Ts are initially placed in the first row, 
we have no way of placing the 2’s by regular application. We cannot start on the 
first row because there is only one spot available (remember, we cannot go down 
once we start at a row). We cannot start on the second row because once we place 
the first 2, we are blocked, and the number of symbols in the second row does not 
exceed that of the first row by one. So, there is only one possibility: 


T 

|1||2| 

丄 


T 


T 



Not allowed Allowed 


The only allowed diagram is obtained by a negative application of l’s_ Therefore, 

= - 1. ， 

The fourth column has l\ = 3 and I 2 — 1. So we have three Ts and one 2. 
There are two ways to place the rs: all on the first row，or starting on the second 
row and working our way up until all boxes are filled except the last box of the 
first row. The placement of 2 will be then predetermined. The result is the two 
diagrams shown below: 


24.8 REPRESENTATIOWS OFTHE SYMMETRIC GROUP 717 


1 

1 i| 

T 

1 2 

T 


T 



The first diagram is obtained by a positive application of Ts, the second by a 
negative application. Therefore, 

4 3 ；! 1 ) =+1-1=0， 

Finally, for the last column, l\ = 4. There is only one way to put all the l’s in 
the frame, and that is a negative application. Thus, X& 1 ) = — 1. 

Rather than going through the rest of the table in the same gory detail, we shall 
point out some of the trickier calculations, and leave the rest of the table for the 

reader to fill in. One confusion may arise in the calculation of x^ 2 y The frame 
looks like this, 



and we need to fill this with two Vs and two 2 ， s. The Vs can go into the first row 
or the first column. The 2’s then can be placed in the second row or the second 
column. The result is 


1 

T 

T 

T 



The first diagram has no negative application. The second has two negative ap¬ 
plications, one for the Ts, and one for the Ts. Therefore, the overall sign for the 

second diagram is positive. It follows that =+1 + 1 = + 2 . 

The calculation of x^) may also be confusing. We need to place four Fsin 
the frame. If we start on the first row, we are stuck, because there is room for only 
two Fs. If we start in the second row, then we can only go up: Putting the first 
1 in the second row causes that row to have one extra 1 in comparison with the 
preceding row. However, once we go up, we have room for only two Ts (we cannot 
go back down). So, there is no way we can place the four Vs in the (2 2 ) frame, 

and = 0 . 

The character table for ^4 is shown in Table 24.6 (see Problem 24.15 as well). 
The reader is urged to verify all entries not calculated above. The character table 
for Ss can also be calculated with only minor tedium. We quote the result herein 
Table 24.7 and let the reader check the entries of the table. 


24.8.4 Young Operators 

The group algebra techniques of Section 24.4—which we used in our discussion of 
representation theory in a very limited way~provide a powerful and elegant tool 


718 24. GROUP REPRESENTATION THEORY 



(l 4 ) 

(2, l 2 ) 

(2 2 ) 

(3,1) 

(4) 

了 (4) 

1 

1 

1 

1 

1 

r (3,l) 

3 

1 

— 1 

0 

一 1 

f(2 2 ) 

2 

0 

2 

-1 

0 

r (2,l 2 ) 

3 

-1 

—1 

0 

1 

7(1 4 ) 

1 

-1 

1 

1 

-1 


Table 24.6 Character table for 54. The rows and columns are labeled by the partitions of 
4 and cycle structures, respectively. 


for unraveling the representations of finite groups. These techniques have been 
particularly useful in the analysis of the representations of the symmetric group. 
Our emphasis on the symmetric group is not merely due to the importance of S n as 
a paradigm of all finite groups. It has also to do with the unexpected usefulness of 
the representations of S n in studying the representations of GL(V), the paradigm 
of all (classical) continuous groups. We shall come back to this observation later 
when we discuss representations of Lie groups in Chapter 27. 

To begin with, consider the element of the S n group algebra as defined in 
Equation (24.22). Since multiplying P (on the left) by a group element does not 
change P, the ideal generated by P is not only one-dimensional, but all elements 
of S n are represented by the number 1. Therefore, the ideal AP corresponds to the 
(irreducible) identity representation. 

For S n , there is another group algebra clement that has similar properties. This 
is 

n! 

Q — 7ti e S n , (24.62) 

i=l 

The reader may check that 

TtjQ = € nj Q and Q 2 = n\Q. 

As in the case of P, Q generates a one-dimensional ideal, but a left multiplication 
may introduce a minus sign (when the permutation is odd). Thus, the ideal gener¬ 
ated by Q must correspond to the antisymmetric (or alternating) representation. 

All the irreducible representations, including the special one-dimensional 
cases above, can be obtained using this group-algebraic method. We shall not 
give the proofs here, and we refer the reader to the classic book [Boer 63, 
pp. 102-125]. The starting point is the Young frame corresponding to the par¬ 
tition ( 入 ） =( 入 1 ， .. ， ，入 w )_ One puts the numbers 1 through n in the frame in any 
order, consistent with tableau construction, so that the end product is a Young 
tableau. Let pb& any permutation of a Young tableau that does not change the row 


24.8 REPRESEWTATIONS OFTHE SYMMETRIC GROUP 719 


horizontal and 
vertical permutations 


Young operators 
defined 



(l 5 ) 

( 2 , l 3 ) 

( 2 2 , 1 ) 

(3,2) 

(3, l 2 ) 

(4,1) 

⑶ 

7 1 ⑶ 

1 

1 

1 

1 

1 

1 

1 

^(4,1) 

4 

2 

0 

-1 

1 

0 

-1 

7(3,2) 

5 

1 

1 

1 

-1 

—1 

0 

j(3,l 2 ) 

6 

0 

-2 

0 

0 

0 

1 

7 t ( 2 2 , 1 ) 

5 

-1 

1 

一 1 

—1 

1 

0 

7 ( 2 , 1 3 ) 

4 

-2 

0 

1 

1 

0 

-1 

01 5 ) 

1 

-1 

1 

-1 

1 

-1 

1 


Table 24.7 Character table for 1S5. The rows and columns are labeled by the partitions of 
5 and cycle structures, respectively. 


of the numbers it acts on. Such a/? is called a horizontal permutation. Similarly, 
let 《 be a vertical permutation of the Young tableau, i.e” a permutation that does 
not change the column of the numbers it acts on. 

24.8.9. Definition. Consider the kth Young tableau corresponding to the partition 
(X). Let the Young symmetrizer and antisymmetrizer be the elements 
of the group algebra of S n defined as 

p q 

Then, the Young operator of this tableau, another element of the group alge¬ 

bra, is given by 

It can be shown that the following holds. 

24.8.10, Theorem. The Young operator is essentially idempotent, and gen¬ 
erates a minimal left ideal，hence an irreducible representation for S n . Represen¬ 
tations thus obtained from different frames are inequivalent. Different tableaux 
with the same frame give equivalent irreducible representations. 

In practice, one usually chooses the standard Young tableaux and applies the 
foregoing procedure to them to obtain the entire collection of inequivalent irre¬ 
ducible representations of S n . We have already seen how to calculate characters 
of S n employing both analytical and graphical methods. Theorem 24.8.10 gives 
yet another approach to analyzing representations of S n . For low values of n this 
technique may actually be used to determine the characters, but as n grows, it 
becomes unwieldy, and the graphical method becomes more manageable. 


720 24. GROUP REPRESENTATION THEORY 


24.8.11. Example. Let us apply this method to 53 . The partitions are (3), (2,1), and 
(l 3 ). There is only one standard Young tableau associated with (3) and (l 3 ). Thus, 


1 ^ 1 

卩⑶ = 尸⑶ =E 巧 =( e + 冗 2 + 丌 3 + 兀 4 + 冗 5 + 冗 6 ) ， 

1 

j /i3、. 1 1 \ 

) = g(i ) = 5 € 7VjTCj — g ( 泛 - 芘 2 - 冗 3 - 兀 4 + 兀 5 + 冗 6 )， 


where we have divided these Young operators by 6 to make them idempotent; we have also 
used the notation of Example 23.4.1. One can show directly that ) = 0. In fact, 

one can prove this for general S n (see Problem 24.30). 

For the partition (2,1), there are two Young tableaux. The first one has the numbers 1 
and 2 in the first row and 3 in the second. In the second tableau the numbers 2 and 3 are 
switched. Therefore, using the multiplication table for as given in Example 23.4.1, we 
have 

= (e _ jr 3 ) (e + jr 2 ) = e + 冗2 — 兀 3 -兀 6’ 

I^ 2,1) = G^ 2,1) F^ 2,1) = (e-7T2) (e + JT3) - e - tt 2 + tt 3 - jt 5 . 

The reader may verify that the product of any two Young operators corresponding to 
different Young tableaux is zero and that 


y(2 ， l)y(2 ， l) = 3y(2 ， l) J^(2,1) y^2,1) _ 1) 


Let us calculate the left ideal generated by these four Young operators. We already 
know from our discussion at the beginning of this subsection that £(刃 and «C(1 ) ， the ideals 
generated by l 7 ⑶ and F( l3 ) ， are one-dimensional. Let us find 厶产 ” ， the ideal generated by 
This is the span of all vectors obtained by multiplying on the left by elements 

of the group algebra. It is sufficient to multiply by the basis of the algebra, namely 
the group elements: 


W 2 ， 1 ^ if ，”， 

j^yf 2 ’ 1 ) = 7T2 + e - 7T5 — ^4 s A ^ 2 ’ 1 )， 

7i 3 Y^ 2,1 ^ =7T^+Jt^ — e — it2 — - 

=jr 4 +jr 5 _ jt 6 _ 7T 3 = + Y^ l) 

= JT5 + 7T4 — 7T2 ^ ^ = -Xp’l )， 

= ； r 6 + 7T 3 - 7T 4 - 7T 5 = ，” - if ，”_ 


It follows from the above calculation that as a vector space，is spanned by {^ 2,1 \ 

Xj 2,1 ^}, and since these two vectors are linearly independent, I^ 2 ， 1 ) is a two-dimensional 
minimal ideal corresponding to a two-dimensional irreducible representation of 53 . One 
can use this basis to find representation matrices and the simple characters of S 3 . 


24.8 REPRESENTATIONS OFTHE SYMMETRIC GROUP 721 


The other two-dimensional irreducible representation of 1 S 3 , equivalent io the one above, 
is obtained by constructing the ideal generated by 衫 2 ， 1 ). This construction is left 
for the reader, who is also asked to its dimensionality. 

The resolution of the identity is easily verified: 

e = ^ (3) + ^y (l3) +1^ 2 ， 1 ) + . 


疋 1 


访 2 


: £4 


The q’s are idempotenls that satisfy eiej = 0 for / ^ j. 


M 


24.8.5 Products of Representations of S n 

In the quantum theory of systems of many identical particles, the wave function 
must have a particular symmetry under exchange of the particles: If the particles are 
all fermions (bosons), the overall wave function must be completely antisymmetric 
(symmetric). Since the space of functions of several variables can provide a carrier 
space for the representation of any group, we can, in the case of S n , think of the 
antisymmetric (symmetric) functions as basis functions for the one-dimensional 
irreducible identity (alternating) representation. To obtain these basis functions, we 
apply the Young operator K。” (or 7 ⑻） to the arguments of any given function of 
n variables to obtain the completely antisymmetric (or symmetric) wave function. 7 

Under certain conditions, we may require mixed symmetries. For instance, in 
the presence of spin，the product of the total spin wave function and the total space 
wave function must be completely antisymmetric for Fermions. Thus, the space 
part (or the spin part) of the wave functions will，in general, have mixed symmetry. 
Such a mixed symmetry corresponds to some other Young operator, and the wave 
function is obtained by applying that Young operator to the arguments of the wave 
function. 

Now suppose that we have two separate systems consisting of n\ and ni parti¬ 
cles, respectively, which are all assumed to be identical. As long as the two systems 
are not interacting, each will consist of states that are classified according to the 
irreducible representations of its symmetric group. When the two systems interact, 
we should classify the states of the total system according to the irreducible repre¬ 
sentations of all«t +«2 particles. We have already encountered the mathematical 
procedure for such classification: It is the Clebsch-Gordan decomposition of the 
direct product of the states of the two systems. Since the initial states correspond to 
Young tableaux, and since we are interested in the inequivalent irreducible repre¬ 
sentations, we need to examine the decomposition of the direct product of Young 
frames into a sum of Young frames. We first state (without proof) the procedure 
for such a decomposition, and then give an example to illustrate it. 

24.S.12* Theorem. To find the components of Young frames in the product of two 
Young frames, draw one of the frames. In the other frame, assign the same symbol, 


7 We must make the additional assumption that the permuted functions are all independent. 



722 24. GROUP REPRESENTATION THEORY 


say 1, to all boxes in the first row, the same symbol 2 to all boxes in the second 
row, etc. Now attach the first row to the first frame, and enlarge in all possible 
ways subject to the restriction that no two 1 *s appear in the same column, and that 
the resultant graph be regular. Repeat with the 2\ etc” making sure in each step 
that as we read from right to left and top to bottom，no symbol is counted fewer 
times than the symbol that came after it The product is the sum of all diagrams so 
obtained. 

To illustrate the procedure, consider the product 

H D ^ I s 

We have put two Ts in the first row and one 2 in the second row of the frame to 
the right. Now apply the first row to the frame on the left. The result is 


nnmm 


□ 

1 


ll II 1| 



□ 

—— 

T 

T 


T 

T 


Now we apply the 2 to each of these graphs separately. We cannot put a 2 to the 
tight of the Vs, because in that case, as we count from right to left, we would start 
with a 2 without having counted any l’s. The allowed graphs obtained from the 
first diagram are 


□ 


|i||i| 


nmrn 

□ 

T 






T 



Applying the 2 to the second graph, we obtain 

□□E 

~\[i 

2 ] 



and to the third graph gives 



Finally the last graph yields 





T 

T 

f 





24.9 PROBLEMS 723 


The entire process described above is written in terms of frames as 

fl D 0R D = 


+ 2 


Some simple products, some of which will be used later, are given in Figure 24.2. 

24.9 Problems 

24 . 1 . Show that the action of a group G on the space of functions ijr given by 

T g 沴 (x) = . x) is a representation of G. 

24.2. Complete Example 24.1.6. 

24.3. Let the vector space carrying the representation of ^3 be the space of func¬ 
tions. Choose y, z) ^ xy and find the matrix representation of ^3 in the 
minimal invariant subspace containing ^ 1 . Hint: See Example 24.1.8. 

24.4. Let the vector space carrying the representation of S 3 be the space of func¬ 
tions. Choose (a ) 少 1 (x, y,z)=x and (b) y, z) ^ x 2 9 and in each case, find 
the matrix representation of S 3 in the minimal invariant subspace containing ^ 1 . 

24.5. Show that the representations T, T, and T* are cither all reducible or all 
irreducible. 

24 . 6 . Show that the hermitian conjugate of Equation (24.5) gives = Ab¬ 
using this relation show that S 三 A^A commutes with all T/s. This result is used 
to prove Sehur’s lemmas in infinite dimensions. 

24.7. Show that elements of a group belonging to the same conjugacy class have 
the same characters. 

24.8. Show that the regular representation is indeed a representation, i.e.，that 
R : G ^ GL(m, M) is a homomorphism. 

24.9. Show that a left ideal is minimal if and only if it is generated by a primitive 
idcmpotent. 

24 . 10 * Let G be a finite group. Define the element P = x of the group 
algebra and show that the left ideal generated by P is one-dimensional. 




724 24. GROUP REPRESENTATION THEORY 


□ ® □= 
□ □ ® □= 


1 II 1 ® ! 


III 1 ® 

= 





Figure 24.2 Some products of Young frames for small values of n. 


24.11. Show that T^) defined in Equation (24.25) commutes with all operators 
if). Hint: Consider T 》 ) 十 ) (T^) -1 . 

24 J2. Let denote the set of inverses of a conjugacy class Ki with c/ elements. 

(a) Show that is also a class with c* elements. 

(b) Show that identity occurs exactly c\ times in the product and none in the 
product KiKj if j / i f [see Equation (24.23)]. 

⑹ Conclude that 


Cijl 


0 if 7 ^ 

Ci ifj= V. 


24.13. Show that the coefficients \G\dj/\H\ci of Equation (24.35) are integers. 

24.14. Show that the symmetric and the antisymmetric representations of S n are 
inequivalent. 


24.15. Construct the character table for S 4 from that of S 3 (given as Table 24.4 )， 
and verify that it is given by Table 24.8. 


24.16. Show that all functions transforming according to a given row of an irre¬ 
ducible representation have the same norm, 

24.17, Show that if the group of symmetry of V contains that of Hq and 
belongs to the jih column of the ath irreducible representation, then so does 
V \<t>f^y Conclude that \ V \^) = 0 for i ^ j. 

24.18* Find the irreducible components of the representation of Example 24.1.6. 


24.9 PROBLEMS 725 




6 k 2 

3 K 3 

8 尤 4 

6 K 5 

j-(l) 

l 

1 

1 

1 

1 

: r ⑵ 

l 

-1 

1 

1 

一 1 

r ⑶ 

2 

0 

2 

一 1 

0 

r( 4 ) 

3 

1 

一 1 

0 

-1 

5r ⑶ 

3 

-1 

一 1 

0 

1 


Table 24.8 Character table for S 4 . 


24.19. Show that P ⑶ | 此 } of Example 24.6.3 is a linear combination of P ⑶ | 少 1 > 
and P( 3 ) I 少 2 }. 

24*20. Show that the tensor product of two unitary representations is unitary. 

24.21. Switch the dummy indices of the double sum in (24.45)，add (subtract) the 
two sums, and use (T 0 T)i a ,jb(g) = (T ® T) a i^j{g) to show that the double 
sum can be written as a sum over the symmetric (antisymmetric) vectors alone. 

24.22. Show that the characters x S (s) X A (g) of the symmetrized and anti¬ 
symmetrized product representations are given, respectively, by 

X S (8) = i[(x(g)) 2 + X (8 2 )] and ?(g) = j[(x(g)) 2 - X(g 2 )l 

24.23. Suppose that transforms according to r ⑻， and according to T ⑻. 

Show that transforms according to r (ax 灼 . 

24.24. Show that 

101 2 = k_) 11 a ⑷ "〆))/. 

One can interpret this as the statement that the square of the reduced matrix element 
is proportional to the average (over/ and j) of the square of the full matrix elements. 

24.25. Construct the character table of 54 using the analytical method and Equation 
(24.57). 

24*26. Find all the standard Young tableaux for ^ 5 . Thus, determine the dimension 
of each irreducible representations of S 5 . Check that the dimensions satisfy the 
second equation of (24.18). 

24.27. Verify the remaining entries of Table 24.6. 

24.28. Construct the character table of ^ 5 . 



726 24. GROUP REPRESENTATION THEORY 


24.29. Suppose that Q, an clement of the group algebra of 5 Wj is given by 

n! 

7Tj € Sn ■ 


and Q 2 — n\Q. 

24*30. Show that y ⑻ = 0_ Hint: There are as many even permutations in 
S n as there are odd permutations. 

24.31. Show that that the product of any two Young operators of S 3 corresponding 
to different Young tableaux is zero and that 

y ai) y (2,l) = 

24.32. Construct the ideal generated by and verify that it is two 
dimensional. 

24.33* Using the ideal (f， 1 ) generated by Y^ 2,1 \ find the matrices of the irre¬ 
ducible representation From these matrices calculate the simple characters 

/*2 y \ 

of S 3 and compare your result with Table 24.4. Show that the ideal ' generated 

by gives the same set of characters. 

24.34. Find all the Young operators for ^4 corresponding to the first entry of 
each row of Figure 24.1. Find the ideals L^ 3 ， 1 ) and £^ 22 ) generated by the Young 
operators I^ 3 ，” and Y ^ 2 ^ corresponding to the second and third rows of the table. 
Show that and have 3 and 2 dimensions, respectively. 

24.35. Verify the products of the Young frames of Figure 24.2. 


jr(2,1) y(2,1) _ ^y(2,l) 


q = 

i=l 

Show that 
ytj Q = €jij Q 


Additional Reading 

1. Barut,A. and Raczka, R. Theory ofGroup Representations and Applications^ 
World Scientific, 1986. A comprehensive treatise on group theory and its 
application to physics written in a formal, but readable, language. 

2. Boemer, H. Representation of Groups ， North-Holland, 1963. A detailed 
introduction to group representation using group algebra techniques. 

3. Elliott, J. and Dawber, P. Symmetry in Physics (2 volumes), Oxford Univer¬ 
sity Press, 1979. Another informal book written for physicists. 


24,9 PROBLEMS 727 


4. Fulton, W. and Harris, J. Representation Theory ， Springer-Verlag, 1991. A 
formal introduction to representation theory designed for graduate students 
in mathematics, but also useful for physicists with a solid math background 

5. Hamermesh, M. Group Theory and its Application to Physical Problems, 
Dover, 1989. A detailed and very readable account of the representation of 
symmetric groups written for physicists. 




25 _ 

Algebra of Tensors 


Up until a mere two decades ago, tensors were almost completely synonymous 
with (general) relativity ― except for a minor use in hydrodynamics. Students of 
physics did not need to study tensors until they took a course in the general theory 
of relativity. Then they would read the introductory chapter on tensor algebra and 
analysis, solve a few problems to condition themselves for index “gymnastics，” 
read through the book, learn some basic facts about relativity, and finally abandon 
it (unless they became relativists). 

Today, with the advent of gauge theories of fundamental particles, the realiza¬ 
tion that gauge fields are to be thought of as geometrical objects, and the widespread 
belief that all fundamental interactions (including gravity) are different manifes¬ 
tations of the same superforce, the picture has changed drastically. 

Two important developments have taken place as a consequence: Tensors have 
crept into other interactions besides gravity (such as the weak and strong nuclear 
interactions), and the geometrical (coordinate-independent) aspects of tensors have 
become more and more significant in the study of all interactions. The coordinate- 
independent study of tensors is the focus of the fascinating field of differential 
geometry and Lie groups, the subject of the remainder of the book. This chapter 
covers tensor algebra, while the next is devoted to tensor analysis. 

As is customary, we will consider only real vector spaces and abandon the 
Dirac bra and ket notation, whose implementation is most advantageous in unitary 
(complex) spaces. From here on, the basis vectors 1 of a vector space V will be 
distinguished by a subscript and those of its dual space by a superscript. If {e* }^_ 1 is 

Einstein’s summation a basis in V, then {e J }^ =1 is a basis in V*. Also, Einstein’s summation convention 

convention will be used: 


1 We denote vectors by roman boldface, and tensors of higher rank by sans serif bold letters. 



25.1 MULTILINEAR MAPPINGS 729 


25.0.1. Box. Repeated indices’ of which one is an upper and the other a 
lower index，are assumed to be summed over: afb l j means a 》 b l y 


As a result of this convention, it more natural to label the elements of a matrix 
representation of an operator A by aj (rather than a#)，because then Ae/ = ajej. 

25.1 Multilinear Mappings 

Since tensors are special kinds of linear operators on vector spaces, let us reconsider 
L(V, W), the space of all linear mappings from the real vector space V to the real 
vector space W. We noted in Chapter 3 that £(V ， W) is isomorphic to a space with 
dimension dim V - dimW. The following proposition ― whose proof we leave to 
the reader — shows this directly. 

25.1.1. Proposition. Let{ei }^j be a basis for V and a basis for W. Then 

1. the linear transformations : V ^ W in the vector space L(V, W), 
defined by (note the new way of writing the Kronecker delta) 

T ^ e / — j = 1，…， N。p = 1 ， …， N 2 , 

form a basis irt C(V, W). In particular, dim rC(V ， W) =N\N 2 - 

2. IfTj are the matrix elements of a linear transformation T e W), then 

t = rpi ， 

The dual space V* is simply the space <C(V ， R). Proposition 25.1.1 (with N 2 = 
1) then implies that dimV* = dim V, which was shown in Chapter 1. The dual 
space is important in the discussion of tensors, so we consider some of its properties 
below. 

The basis {T^} of Proposition 25.1.1 reduces to {T^ } when W —* u^. and is 

denoted by with N = =dim V* = dim V. The *s have the property that 

»/ 

e j (ei) = Sj. (25.1) 

This relation was also established in Chapter 1. The basis = {e^}J =1 is simply 

the dual of the basis B {e^« Note the “natural” position of the indices for B 
andB*. 一 

Now suppose that = B r is another basis of V and R is the (invertible) 

matrix carrying B onto B\ Let B f * = {^}^ =1 be the dual of We want to find 
the matrix that carries B* onto If we denote this matrix by A and its elements 
by aj , we have 

fif = ipHi = (afe l )(rj ej) =： a\r{ 8 l j = a\r\ = (AR)f, 





730 25. ALGEBRA OF TENSORS 


where the first equality follows from the duality of B f and 5’*. In matrix form, 
this equation can be written as AR = 1, or A = R _1 . Thus, 


25.1.2. Box. The matrix that transforms bases of V* is the inverse of the 
matrix that transforms the corresponding bases ofV. 


In the above equations the upper index in matrix elements labels rows, and the 
lower index labels columns• This can be remembered by noting that the column 
vectors e t * can be thought of as columns of a matrix, and the lower index i then 
labels those columns. Similarly, can be thought of as rows of a matrix. We now 
generalize the concept of linear functionals. 

multilinear mappings 25.1.3. Definition, A map T : Vi x V 2 x ■ • • x V r ^ W w called r-linear if it is 
defined linear in all its variables: 

丁 ⑺ ，…， aw +a’vj.v r ) 

=QfT(Vl ，， ■ ■ ， V ，. ， • • ■ ， V；") + T(vi, .••，'•，■••，¥/*) 

for all i. 

We can easily construct a bilinear map. Let t\ e V* and T 2 G We define 
the map ri 0 T 2 : Vi x V 2 — R by 

ti <S) r 2 (vi, v 2 ) = ri(vi)T 2 (v 2 ). (25.2) 

The expression t\ ® r 2 is called the tensor product of r\ and T 2 . Clearly, since 
T\ and T 2 are separately linear, so is t\ <S) T 2 . 

An r-linear map can be multiplied by a scalar, and two r-linear maps can be 
added; in each case the result is an r-linear map. Thus, the set of r-Iinear maps from 
Vi x … x V, into W forms a vector space that is denoted by <C(Vi,..., V r ; W). 

We can also construct multilinear maps on the dual space. First, we note that we 
can define a natural linear functional on V* as follows. We let r € V* and v € V; 
then r(v) € IR. Now we twist this around and define a mapping v : V* —^ R 
given by v(r) = t ( v ) . It is easily shown that this mapping is linear. Thus, we have 
naturally constructed a linear functional on V* by identifying (V*)* with V. 

Construction of multilinear maps on V* is now trivial. For example, let Vi 
and V 2 € V 2 and define the tensor product v! V 2 : Vf x % ► M by 

vi <8) V2(ri,r 2 ) = vi(ri)v 2 (T2 ) = ti(vi)t2(v 2 ). (25.3) 

We can also construct mixed multilinear maps such as v<S)r : V* x V R given 
by 


v 0 u) = v(0)t(u) — 0(v)r(u). 


(25.4) 



25.1 MULTILINEAR MAPPINGS 731 


natural pairing of There is a bilinear map h : V* x V R that naturally pairs V and it is 
vectors and their given by h(0, \) = 0 (v). This mapping is called the natural pairing of V and V* 
duals i nt0 脫 and is denoted by using angle brackets: 

h(^,v) = {0,y) = tf(v). 

25.1*4. Definition. Let V be a vector space with dual space V*. Then a tensor of 
type (r, s) is a multilinear mapping 2 

T: : V* x V* x … x V* x V x V x … x V ^ M. 
r times s times 

The set of all such mappings for fixed r and s forms a vector space denoted by 
X(V). The number r is called the contravariant degree of the tensor，and s is 
called the covariant degree of the tensor. 

As an example, letvi, v r G Vandr 1 . r s e V*, and define the tensor 

product Tj = vi <g) • • • <8) v r <8) r 1 ® ® r 5 by 

vi ® … ® v r ® t 1 ® … ig) r s (0 l . 0 r 9 ui, ... ,11^) 

r s 

-VI (tf... v r (^ r )r 1 (Ui)... T 5 (u^)= nn 

i=l ./=1 


tensors; covariant 
and contravariant 
degrees 


contravariant and 
covariant vectors and 
tensors 


tensors form an 
algebra 


multiplication of the 
algebra of tensors 


Each v in the tensor product requires an element of V*; that is why the number 
of factors of V* in the Cartesian product equals the number of v's in the tensor 
product. As explained in Chapter 0, the Cartesian product with s factors of V is 
denoted by V s (similarly for V*). 

A tensor of type (0,0) is defined to be a scalar, so Oq(V) = M. A tensor of 
type (1 ， 0)，an ordinary vector, is called a contravariant vector, and one of type 
(0,1), a dual vector (or a linear functional), is called a covariant vector. A tensor 
of type (r, 0) is called a contravariant tensor of rank r，and one of type (0, s) is 
called a covariant tensor of rank j 1 . The union of 0^ (V) for all possible r and s 
can be made into an (infinite-dimensional) algebra, called the algebra of tensors, 
by defining the following product on it: 

25.1.5. Definition. The tensor product of a tensor T of type (r, s) and a tensor U 
of type (k, 0 is a tensor T(g> U of type (r + fc, j + Z), defined，as an operator on 
(V*) r+ * x V" +/ , by 

=T^ 1 . 0\u\ . w 5 )U(^ r+1 . % + 1，…， 


This product turns the (infinite-dimensional) vector space of all tensors into an 
associative algebra called a tensor algebra. 


2 Just as the space V* of linear functionals of a vector space V is isomorphic to V, so is the space of tensors of type (r, s) to 
the tensor product space V r%s (see Example 1.3.19). In fact, 7^ (V) — as shown in [Warner, 83] on p. 58. 


732 25. ALGEBRA OFTENSORS 


and 

£(V*) are all the 
same 


This definition is a generalization of Equations (25.2) ，（ 25.3)，and (25.4). It is 
easily verified that the tensor product is associative and distributive (over tensor 
addition), but not commutative. 

Making computations with tensors requires choosing a basis for V and one for 
V* and representing the tensors in terms of numbers (components). This process 
is not，of course, new. Linear operators are represented by arrays of numbers, i.e .， 
matrices. The case of tensors is merely a generalization of that of linear operators 
and can be stated as follows: 

25.1.6, Theorem. Let {e r }^ =1 be a basis in V ， and {e- / }^ =1 a basis in V* ， usually 

taken to be the dual of{ti}f =v Then the set of all tensor products e/ L <8) • ■ • ® e/ r 0 
€ 力 0 e 力 forms a basis for (V). Furthermore, the components of any tensor 

A G 0^(V) are 

= A ( ejl ，…，， e ii .%)■ 

Proof. The proof is a simple exercise in employing the definition of tensor products 
and keeping track of the multilinear property of tensors. Details are left for the 
reader. □ 

A useful result of the theorem is the relation 

A = 0 … ® % (g^ 1 0 … 0 (25.5) 

Note that for every factor in the basis vectors of (V) there are N possibilities, 
each one giving a new linearly independent vector of the basis. Thus, the number 
of possible tensor products is N r ^~ s 9 and we have dim 0^(V) = (dim V) r + J . 

25.1.7. Example. Let us consider the special case of T}(V) as an illustration. We can 

write A e tJ(V) as A = 0 eA Given any v € V, we obtain 2 

A(v) = (A l jei <gi e J )(y) = A l -ti [e 』 (v)]. 

eR 

This shows that A(v) e V and A can be interpreted as a linear operator on V ， i.e” A € £(V). 
Similarly, for t e 木 we get 

A(t) = (Aye* ® ^)(r) = A 1 . [e ， (r)]. 

J J v - 7 

eM 

Thus, A e £(V*). We have shown that given A e Tj(V), there corresponds a linear operator 
belonging to ,C(V) [or £(V*)] having a natural relation to A. Similarly, given any A g £i(V) 
[or £(V*)] with a matrix representation in the basis of V of V*) given 

by A 1 j 9 then corresponding to it in a natural way is a tensor in T}(V), namely 0 eK 
Problem 25.5 shows that the tensor defined in this way is basis-independent. Therefore, 



25.1 MULTILINEAR MAPPINGS 733 


natural isomorphism there is a natural one-to-one correspondence among T}(V), £/(V), and ((V*) ， This natural 

correspondence is called a natural isomorphism. Whenever there is a natural isomorphism 
between two vector spaces, those vector spaces can be treated as being the same. M 

We have defined tensors as multilinear machines that take in a vector from a 
specific Cartesian product space of V’s and V*，s and manufacture a real number. 
Given the representation in Equation (25.5), however, we can generalize the in¬ 
terpretation of a tensor as a linear machine so that it takes a vector belonging to a 
Cartesian product space and manufactures a tensor. This corresponds to a situation 
in which not all factors of (25.5) find “partners.” An illustration of this situation 
was presented in the preceding example. To clarify this let us consider A € T^V), 
given by 

A = A l jk ei < 8 > e j 0 e k . 

This machine needs a Cartesian-product vector of the foim(r, Vi, V 2 ), with t eV* 
and vi, V 2 € V, to give a real number. However, if it is not fed enough, it will not 
complete its job. For instance, if we feed it only a dual vector r e V*,it will give 
a tensor belonging to 7^(V): 

A(r) = 效 e k )(r) = A^[ei(r)]e y ® 

If we feed it a double vector (vi, V 2 ), it will manufacture a vector in V: 

A(vi, v 2 ) = (A^et (g> e fc )(vi,v 2 ). = A^e/te^vi)]^^)] e V 

What if we feed it a single vector v? It will blow its whistles and buzz its buzzers, 
because it does not know whether to give v to or e k (it is smart enough to know 
that it cannot give v to e/). That is why we have to inform the machine with which 
factor of e to match v. This is done by properly positioning v inside a pair of 
parentheses: If we write (■ ， v, .)，the machine will know that v belongs to e』，and 
(. ， ■ ， v) tells the machine to pair v with e k . If we write (v，•，.)，the machine will 
give us an “error message” because it cannot pair v with ! 

The components of a tensor A ， as given in Equation (25.5)，depend on the basis 
in which they are described. If the basis is changed, the components change. The 
relation between components of a tensor in different bases is called the transfor¬ 
mation law for that particular tensor. Let us investigate this concept. 

We use overbars to distinguish among various bases. For instance, B = 

B — and B = are three different bases of V. Similarly, = 

= {e J }^ =1> and 5 = are bases of V*. The components are 

also distinguished with overbars. Recall that if R is the matrix connecting B and 


2 Here, we are assuming that A acts on an object (such as v) by pairing it up” with an appropriate factor of which A is composed 
(such as e J ). 


positioning of 
vectors and their 
duals to match the 
tensor 


734 25. ALGEBRA OF TENSORS 


B, then S = R 一 1 connects B* and B*. For a tensor A of type (1,2), Theorem 
25.1.6 gives 

A l jk = A(€Sej,ejfc) = A(sj n e m ， r 】 e n ， r 【 e p ) 

- ^m r j r k Me n \e n ,e p ) = A; (25.6) 

This is the law that transforms the components of A from one basis to another. 
In the classical coordinate-dependent treatment of tensors, Equation (25.6) was 
the defining relation for a tensor of type (1,2). In other words, a tensor of type 

(1.2) was defined to be a collection of numbers, A^ p that transformed to another 

collection of numbers A - k according to the rule in (25.6) when the basis was 
changed. In the modem treatment of tensors it is not necessary to introduce any 
basis to define tensors. Only when the components of a tensor are needed must 
bases be introduced. The advantage of the modem treatment is obvious, since a 

(1.2) -type tensor has 27 components in three dimensions and 64 components in 
four dimensions, and all of these are represented by the single symbol A. However, 
the role of components should not be downplayed. After all, when it comes to 
actual calculations, we are forced to choose a basis and manipulate components. 

Since ^ (V) are vector spaces, it is desirable to investigate mappings from one 
such space to another. We will be particularly interested in linear mappings. For 
example, f : Uq(V )— 对 (A?)= =R is what was called a linear functional before. 
Similarly, t :对 (V)— 却 (V) is a linear transformation on V. A special linear 
transformation is tr : T} (V) = R, given by 

N 

trA = tr(Aye* ® eO = AJ. = 

/=i 

This is the same trace encountered in the study of linear transformations in Chapter 

3, 

Although the above definition makes explicit use of components with respect to 
a basis, it can be easily shown (see Problem 25.7) that it is in fact basis-independent. 
Functions of tensors that do not depend on bases are called invariants. Another 
example of an invariant is a linear functional (see Problem 25.6). 

25.1.8. Example. Consider the tensor A e Tq(V) given by A = ej ® ei + 62 ej. We 

calculate the analogue of the trace for A: ^ 1+0=1. Now we change to a 

new basis, {e\ ，吞 2 }，given by e! = ei + 2 e 2 and e〗=—+ e 2 . In terms of the new basis 
vectors, A is given by 

A = (ei -h 2 吞 2 ) 0 (61 + 2e2) + (-ei +e 2 )^(e! + 2 e 2 ) 

= 3§2 0 ej + 6§2 0 §2 

with ^ii =0 + 6 = 65 ^ A ii' This kind of “trace” is not invariant. M 



25.1 MULTILINEAR MAPPINGS 735 


tensor-valued 
multilinear map 


contraction of a 
tensor 


Whenever a quantity is defined without reference to a basis, it is clearly invari¬ 
ant. The trace is an example of an invariant quantity whose definition does require 
a basis. Another example is the determinant of a linear operator [or a tensor of 
type (1,1)], in terms of a matrix representation of that operator, which is clearly 
basis-dependent. A basis-independent definition for the determinant of a tensor 
will be given later in this chapter. 

Besides mappings of the form h : 7f(V) that depend on a single 

variable, we can define mappings that depend on several variables, in other words, 
that take several elements of T s (V) and give an element of (V). We then write 
h : (7J(V)) m 7f(V), It is understood that h(t\, t w )，in which all t* are in 

W (V), is a tensor of type (k, l). If h is linear in all of its variables, it is called a 
tensor-valued multilinear map. Furthermore, if, \ m ) does not depend on 
the choice of a basis of 0^ (V), it is called a multilinear invariant. In most cases 
k = 0 = l f and we speak of scalar-valued invariants, or simply invariants. An 
example of a multilinear invariant is the determinant considered as a function of 
the rows of a matrix. 

The following defines an important class of multilinear invariants: 

25.1.9. Definition. A contraction of a tensor A € with respect to a con- 
travariant index p and a covariant index q is a linear mapping C : 0^ (V) — >* 
7 r s Z\ (V) given in components by 

j-a✓ »-v-I*"] •♦■ir—l j’*1 — 1 众 ^> 十 1 … ’y* __ V ^ j 


It can be readily shown that contractions are invariants. The proof is exactly 
the same as that for the invariance of the trace. In fact, the trace is a special case 
of a contraction, in which r = 5 = 1. 

The linearity inherent in the construction of tensor algebras carries along some 
of the properties and structures of the underlying vector spaces. One such property 
is isomorphism. Suppose that F : V ^ IX is a vector space isomorphism. Then 
F* : U* — V% the pullback of F, is also an isomorphism (Proposition 1.3.18). 
Associated to F is a linear map — which we denote by the same symbol~from 
X(V) to XCU) defined by 


剛砂 1 ，…， 吖， m ， 


.,U,) = F*e r t F-^i.F —、)， 

(25.7) 


where T 6 (V), 8 l e U*, and e U. The reader may check that this map is 

an algebra isomorphism (see Section 1.4). We shall use this isomorhism to define 
derivatives for tensors in the next chapter. 




736 25. ALGEBRA OFTENSORS 


25.2 Symmetries of Tensors 

Many applications demand tensors that have some kind of symmetry property. 
We have already encountered a symmetric tensor — the metric “tensor” of an inner 
product: If V is a vector space and vi, V 2 € V, then g(\\, V 2 ) = g(\ 2 , Vi). The 
following generalizes this property. 

symmetric tensor 25.2.1* Definition. A tensor A is symmetric in the ith and jth variables if its value 
defined as a multilinear function is unchanged when these variables are interchanged. 
Clearly, the two variables must be of the same kind. 

An immediate consequence of this definition is that in any basis, the com¬ 
ponents of a symmetric tensor do not change when the iih and 7 th indices are 
interchanged. 

contravariant- 25.2.2. Definition. A tensor is contravariant-symmetric if it is symmetric in every 
symmetric; pair of its contravariant indices and covariant-symmetric if it is symmetric in every 
covariant-symmetric, pair of its covariant indices. A tensor is symmetric if it is both contravariant- 

symmetric and covariant-symmetric. 

The set of all symmetric tensors S r (V) of type (r, 0) forms a subspace of the 
vector space 3 Tq. Similarly, the set of symmetric tensors of type (0, 5 ) forms a 
subspace S s of The (independent) components of a symmetric tensor A e § r 
are 山此 …。， where it < h < ■ ■ < i r \ the other components are given by 
symmetry. 

Although a set of symmetric tensors forms a vector space, it does not form an 
algebra under the usual multiplication of tensors. In fact, even if A = (gi e ; * 
and B = B kl ek e/ are symmetric tensors of type (2,0), the tensor product 
A (g) B = A l j B kl ei (g) ej ® e/ need not be a type (4,0) symmetric tensor. For 
instance, A lk may not equal A 1 】B kl • However, we can modify the definition 
of the tensor product (for symmetric tensors) to give a symmetric product out of 
symmetric factors. 

symmetrizer 25«2.3. Definition. A symmetrizer is an operator S : given by 

[SWKt 1 , •••，，）= A [Aer_ ， … ，严⑺)， (25.8) 

■ 7t 

where the sum is taken over the r \ permutations of the integers 1 ， 2 , ■.. ， r, and 
t 1 ， T r are all in V* • §(A) is often denoted by 〜• Clearly, A s is a symmetric 
tensor 

A similar definition gives the symmetrizer § : 巧 — § 方 ■ Instead of r 1 ,..., r r 
in (25.8), we would have vi, … ， v 。 

3 When there is no risk of confusion, we shall delete V from 7^ (V)， it being understood that all tensors are defined on some 
given underlying vector space. 



25.2 SYMMETRIES OFTENSORS 737 


25.2.4. Example. For r = 2, we have only two permutations, and 

Asi^yT 2 ) = ^[A(r l ,r 2 )-{-A(r 2 7 T 1 )] 

For r = 3, we have six permutations of 1, 2, 3, and (25.8) gives 

八 “ 了 1 ， T 2 , 丁 3 )= 去 XT 1 ， T 2 , T 3 ) + A(T 2 , T 1 ， T 3 ) + ArT 1 ， T 3 , T 2 ) 

+ A(t 3 , t 1 , r 2 ) + A(t 3 ,t 2 , t 1 ) + A(r 2 , t 3 , t 1 )] 


It is clear that interchanging any pair of t’s on the RHS of the above two equations does 
not change the sum. Thus, A s is indeed a symmetric tensor. M 


It can be shown that 


symmetric tensors 
form an algebra 
under this product 


dimS r (V)= 


f N ~\~r — 
r 



(iV + r _ 1)! 
r!(iV- 1)T 


The proof involves counting the number of different integers , i r for which 

1 < < i m ^-i < N for each m. 

We are now ready to define a product on the collection of symmetric tensors 
and make it an algebra, called the symmetric algebra. 

25.2*5. Definition. The symmetric product of symmetric tensors A € S r (V) and 
B € § S (V) is denoted by AB and defined, as 


ABCr 1 , … ， T r ^ s ) = BXt 1 ， … ， r r ^ s ) 

rlsl 

= 丄 y" A (，⑴， … ， T n{r) ) B ( 严 (， +1) ， … ， T n ^ s) ), 
r\s\ V 


where the sum is over all permutations d?/l, 2,..., r + s. The symmetric product 
of A € S r (V) andB e S“V) h defined similarly. 

t 

25.2.6. Example* Let us construct the symmetric tensor products of vectors. First we 
find the symmetric product of vi and v 〗 both belonging to V = Tq(V): 

^^2)(^, r 2 ) = vi (^^(r 2 ) + vi (r 2 ^^ 1 ) 

=vi^^Cr 2 ) + v 2 (t 1 )vi(t 2 ) 

=(Vi ③ V2 + V2 0V1KT 1 , r 2 ). 

Since this is true for any pair t 1 and r 2 , we have 

V X V2 = vi ®V2 +V2«S)vi. 

In general, viv 2 ---v r = 卩冗⑴ ® 化⑵ U v^( r) . M 



738 25. ALGEBRA OFTENSORS 


It is clear from the definition that symmetric multiplication is commutative, 
associative, and distributive. If we choose a basis for V and express all 

symmetric tensors in terms of symmetric products of 灼 using the above properties, 
then any symmetric tensor can be expressed as a linear combination of terms of 
the form (ei) ftl .. • (e^v)^. 

Skew-symmetry is the same as symmetry except that in the interchange of 
variables the tensor changes sign. The following theorem, whose simple proof we 
leave as a problem, further characterizes skew-symmetric tensors. 


skew-symmetric 

tensors 


25.2.7. Theorem. A tensor A is skew-symmetric in contravariant indices i and 
j if and only if for all r e V*, substitution of 丁 G for both the ith and jth 
variables of A yields zero, regardless of the values of the remaining variables. 


covariant and 
contravariant 
skew-symmetric 
tensors 


25.2,8. Definition* A covariant (contravariant) skew-symmetric tensor is one that 
is skew-symmetric in all pairs of covariant (contravariant) variables. A tensor is 
skew-symmetric if it is both covariant and contravariant skew-symmetric. 


Leopold Kronecker (1823-1891) was the son of Isidor Kro- 
necker, a businessman, and Johanna Prausnitzer. They were 
wealthy and provided private tutoring at home for their son until 
he entered the Liegnitz Gymnasium. At the gymnasium, Kro- 
necker’s mathematics teacher was E. E. Kummer, who early rec¬ 
ognized the boy’s ability and encouraged him to do independent 
research. He also received Evangelical religious instruction ， al¬ 
though he was Jewish; he formally converted to Christianity in 
the last year of his life. 

Kronecker matriculated at the University of Berlin in 1841 ， 
where he attended lectures in mathematics given by Dirichlet. Like Gauss and Jacobi，he was 
interested in classical philology. He also attended Schelling’s philosophy lectures; he was 
later to make a thorough study of the works of Descartes, Spinoza, Leibniz, Kant, and Hegel ， 
as well as those of Schopenhauer, whose ideas he rejected. 

Kronecker spent the summer setnester of 1843 at the University of Bonn, and the fall 
semester at Breslau (now Wroclaw, Poland) because Kummer had been appointed professor 
there. He remained there for two semesters, returning to Berlin in the winter of 1844-1845 
to take the doctorate. Kronecker took his oral examination consisting of questions not only 
in mathematics, but also in Greek history of legal philosophy. He was awarded the doctorate 
on 10 September 1845. 

Dirichlet, his professor and examiner, was to remain one of Kronecker’s closest friends, 
as was Kummer, his first mathematics teacher. In the meantime, in Berlin, Kronecker was 
also becoming better acquainted with Eisenstein and with Jacobi. During the same period 
Dirichlet introduced him to Alexander von Humboldt and to the composer Felix Mendelssohn, 
who was both Dirichlet’s brother-in-law and the cousin of Kummer’s wife. 

Family business then called Kronecker from Berlin. In its interest he was required to 
spend a few years managing an estate near Liegnitz, as well as to dissolve the banking busi¬ 
ness of an uncle. In 1848 he married the latter’s daughter, his cousin Fanny Prausnitzer; they 





25.3 EXTERIOR ALGEBRA 739 


had six children. Having temporarily renounced an academic career, Kronecker continued 
to do mathematics as a recreation. He both carried on independent research and engaged in a 
lively mathematical correspondence with Kummer; he was not ambitious for fame, and was 
able to enjoy mathematics as a true amateur. By 1855, however, Kronecker’s circumstances 
had changed enough to allow him to return to the academic life in Berlin as a financially 
independent private scholar. 

In 1860 Kummer, seconded by Borchardt and Weierstrass, nominated Kronecker to 
the Berlin Academy, of which he became full member on 23 January 1861. Kronecker was 
increasingly active and influential in the affairs of the Academy, particularly in recruiting the 
most important German and foreign mathematicians for it. His influence outside Gemany 
also increased. He was a member of many learned societies, among them the Paris Academy 
and the Royal Society of London. He established other contacts with foreign scientists in 
his numerous travels abroad and in extending to them the hospitality of his Berlin home. For 
this reason his advice was often solicited in regard to filling mathematical professorships 
both in Germany and elsewhere; his recommendations were probably as significant as those 
of his erstwhile friend Weierstrass. 

The cause of the growing estrangement between Kronecker and Weierstrass was partly 
due to the very different temperaments of the two, and their professional and scientific 
differences. Since they had long maintained the same circle of friends, their friends, too, 
became involved on both levels. A characteristic incident occurred at the new year of 1884- 
1885, when H. A. Schwarz, who was both Weierstrass’s student and Kummer’s son-in-law, 
sent Kronecker a greeting that included the phrase: “He who does not honor the Smaller 
[Kronecker], is not worthy of the Greater [Weierstrass]Kronecker read this allusion to 
physical size — he was a small man, and increasingly self-conscious with age — as a slur on 
his intellectual powers and broke with Schwarz completely. 

Kronecker^ mathematics lacked a systematic theoretical basis. Nevertheless, he was 
preeminent in uniting the separate mathematical disciplines. Moreover, in certain ways — his 
refusal to recognize an actual infinity, his insistence that a mathematical concept must be 
defined in a finite number of steps, and his opposition to the work of Cantor and Dedekind — 
his approach maybe compared to that of intuitionists in the twentieth century. Kronecker’s 
mathematics thus remains influential. ^ 


25.3 Exterior Algebra 

The following discussion will concentrate on tensors of type (r, 0). However, 
interchanging the roles of V andV* makes all definitions, theorems, propositions, 
and conclusions valid for tensors of type (0, s) as well. 

The set of all skew-symmetric tensors of type (/?, 0) forms a subspace of7 p (V). 
This subspace is denoted by A p (V). It is not, however, an algebra unless we define 
a skew-symmetric product analogous to that for the symmetric case. First, we need 
the following: 

antisymmetrizer 25.3.1. Definition. An antisymmetrizer is a linear operator A : 0^ (V) A p (V) t 







74D 25. ALGEBRA OF TENSORS 


given by 

[A^Kt 1 , •••,〆 ）=— ，…， T _). (25.9) 

P' n 


The sum is over all permutations of {12. t . p), and = 石冗⑴ … ?r(/?) & +1 ifthe 
permutation is even and—l if the permutation is odd. A(T) is sometimes denoted 
byT a . 


25.3,2. Example. Let us write out Equation (25.9) for p = 3. The procedure is entirely 
analogous to Example 25.2.4: 

TflOr 1 ， t 2 , t 3 ) = l[€i23Hr l t T 2 ,T 3 ) + € 2 i3A(r 2 , r 1 , 丁 3 > + fi 32 A(r 1 , r 3 , r 2 ) 

+ 9l2A(T 3 , r 1 ， t 2 ) + €321 A(r 3 , t 2 , r 1 ) + 6 2 3lA(r 2 , r 3 , r 1 )] 

= 去 [ACr 1 ， T 2 , T 3 ) — A(T 2 , T 1 ， T 3 ) — A〆 ， T 3 , T 2 ) 

+ A(r 3 , r^r 2 ) - A(r 3 , r 2 , t 1 ) + A(t 2 , T 3 , r 1 )]. 


The reader may easily verify that all terms with aplus sign are obtained from (123) by an even 
number of interchanges of symbols, and those with 狂 minus sign by an odd number. M 


exterior product 
defined 


We can now define the important product that turns the collection of all anti¬ 
symmetric tensors into an algebra. 

25.3.3. Definition. The exteriorproduct (also called the wedge, Grassmann, alter¬ 
nating, or wockproduct) of two skew-symmetric tensors A € A P (V) and B e A q (V) 
is a skew-symmetric tensor belonging to A p + q (V) and given by A 


A AB 


(r 

r\s\ 


A(A ® B) 


(r+ 5)! 

rls\ 


(A<g) B)a. 


More explicitly ， 


A a Bir 1 .r^) = — ( 严 ⑴， _ - ， ，(，.)) B ( 严 ㈣， • • • ，， ㈣) . 


7T 


25.3.4. Example* Let us find the exterior product of vectors. First we find the exterior 
product of vi and V 2 both belonging to V = Tq(V): 

VI AV 2 (t 1 ,T 2 ) = ^^ (1)7r( 2)(Vi 0v 2 ) ( 丁龙⑴ 〆 2 )) 

丌 

= e 12 (Vl <gl V 2 )(r 1 ,T 2 )+62l(Vl 0 v 2 )(t 2 ,t 1 ) 

==vi(tV 2 (t 2 ) - ^\(T 2 )y 2 (r l ) 

=(vi (g>v 2 -v 2 0vi)(r 1 ,r 2 ). 


4 


The reader should be warned that different authors may use different numerical factors in the definition of the exterior product. 



25.3 EXTERIOR ALGEBRA 741 


p-forms defined 


pull-back of p-forms 
by linear 
transformations 


Since this is true for arbitrary r 1 and t 2 , we have 
Vx A V2 = Vi ® V2 - V2 ® vi 
In general, 

Vi A V 2 A • ■ • A V r = 石兀⑴ … ； ^ ，兀⑴ ® ¥^(2) ® … ® y n {ry 

It 

In particular, this shows that the exterior product (of vectors) is associative. M 

If {ej}^ is a basis with dual {€ l 1^, then the last result of Example 25.3.4 
yields 

A A A … a A(e 力 ，勺 2 ，…， = “… d)■ 

丌 (25.10) 

The following theorem contains the properties of the exterior product (fora 
proof, see [Abra 88, p. 394]): 

25.3.5. Theorem. The exterior product is associative and distributive with respect 
to the addition of tensors. Furthermore, it satisfies the following anticommutativity 
property: 

AAB = (-1)^BAA 

■ 

whenever A e A^(V) and B € A q (V). In particular, vi A V 2 = -V 2 A Vi for 
vi, v 2 € V. 

The wedge product of linear functionals of a vector space is particularly im¬ 
portant in the analysis of tensors, as we shall see in the next chapter. 

25.3.6. Deiinitioi). The elements of A P (V^) are called p-forms. 

A linear transformation T ; V W induces a transformation 6 T* : 
A^(W*) ^ 八夕 ( 俨 ） defined by 

(T*/9)(vi, ... 9 \ p ) = p(Tvi ， …， Tv p ), p e A P (W*), Vf e V. 

(25.11) 

T*p is called the pullback of p by T. The most important properties of pullback 
maps are given in the following: 

25.3.7. Proposition. Let T : V — U and S : U W. Then 
L T* : A P (U*) A P (V*) is linear. (So is S *， of course.) 

2, (S o T)* = T* o S*. 


6 Note that T* is the extension of the pullback operator introduced at the end of Chapter L 



742 25. ALGEBRA OFTENSORS 


i. If T is the identity map, so is T*. 

4. If "T is an isomorphism, so is T* and (T*) 一 1 = (T -1 )*. 

5. Ifp e A P (W*) and <r G then T*(p A cr) = J*p A T*cr. 

Proof. The proof follows directly from definitions and is left as an exercise for the 
reader. □ 


The components of A e A p 07) are given by where i\ < i 2 < … < i p * 

All other components are related to these by skew-symmetry. The number of 
independent components, which is the dimension of A p (V) ， is equal to the number 
of ways p numbers can be chosen from among N distinct numbers in such a way 
that no two of them are equal. This is simply the combination of N objects taken 
/? at a time. Thus, we have 


dunA^(V) = 



(25.12) 


In particular, dim A^V) = 1. This should come as no surprise, because from a 
basis of V， we can forma basis for A p (V) by constructing all products of the 

form e “ A 灼 2 八…八 e/ p [of which there are (?)]. However, when p = N, there is ， 
within a multiplicative constant, only one such product and that is ei 八幻八 ■ ■. Aejv ■ 
An elegant way of determining the linear independence of vectors using the 
formalism developed so far is given in the following proposition. 

25.3.8. Proposition. A set of vectors ， vi, ..., \ pt is linearly independent if and 
only if\\ a * • • A ^ 0. 

Proof. If {v* are independent, then they span a p-dimensional subspace M of 
V. Considering M as a vector space in its own right, we have dim A P (M) = 1. A 
basis for A P (M) is simply Vi A ■ • • A v p , which cannot be zero. 

Conversely, suppose that ofivi H -+ a p y p = 0. Then taking the exterior 

product of the LHS with V 2 A V 3 A ■ ■ ■ A makes all terms vanish (because each 
will have two factors of a vector in the wedge product) except the first one. Thus, 
we have aivi A ■ • • A v p — 0. The fact that the wedge product is not zero forces 
a\ to be zero. Similarly, multiplying by vi A V 3 A • • • A shows that a .2 = 0, and 
so on. □ 


25 . 3 . 9 . Example. Let {e/be a basis for V. Let vi = ej + 2e2 — ^ 3,^2 = 3 ei +62 + 
2 e 3 , V 3 = —ei — 3e2 + 2 e〗. 

Take the wedge product of the first two v’s: 

vi A V 2 = (ei + 2 e 2 —巧）八 （3ei + e〗+ 2e^) = —5e\ A 62 + 5ei 八 e〗+ 5 e 2 a 
All the wedge products that have repeated factors vanish. Now we multiply by V 3 ： 

V] a V 2 A V 3 = —5ei Ae 2 A (—ei — 3e2 + 2e^) + 5ei a e 3 A (一 ei — 3e2 + 

+ 5e2 a e 3 a (—ej — 3e2 + 2e^) 


= 一 10ei A 62 A 63 — 15ej a e 3 A e 2 — 5e〗A e 3 八 ei = 0. 




25.3 EXTERIOR ALGEBRA 743 


Cartan’s lemma 


We conclude that the three vectors are linearly dependent. — 

As an application of Proposition 25.3.8, let us prove the following. 

25.3.10. Theorem* (Cartan’s lemma) Suppose that {e/}f =1 , p < dim V, form a 
linearly independent set of vectors in V and that {v/}^ are also vectors in V such 
that Y ， i=\ e i Ay i — 0- Then all % are linear combinations of only the set {e/}^. 
Furthermore ， if\i = ^ij e jt then Aij = Aji . 

Proof. Multiplying both sides of e i A v * = 0 by e 2 八…八 gives 

—Vi A Cl A e2 A • • ■ A Cp = 0. 

By Proposition 25.3.8, vi and the are linearly dependent. Similarly, by multi¬ 
plying J2f=\ e i A v/ — 0 by the wedge product with e /： missing, we show that ^ 
and the ej are linearly dependent. Thus, v* = Ylf=zi , forallfe. Furthermore, 
we have 

p p p 

Q = ^ efc A va ： = efc a (A^e/) = 一 Ae/, 

k=\ k=l i=l k<i 

where the last sum is over both A: and / with k < i. Clearly, {ejt Ae*} with k < i are 
linearly independent (show this!). Therefore, their coefficients must vanish. □ 


Elie Joseph Cartan (1869-1951), bom in Dolomieu (near 
Chambery), Savoie, Rhone-AIpes, France, became a student 
at the Ecole Normale in 1888 and obtained his doctorate in 
1894. He lectured at Montpellier (1894-1896), Lyon (1896- 
1903), Nancy (1903-1909), and Paris (1909-1940). He had 
four children, one of whom, Henri Cartan, was to produce bril¬ 
liant work in mathematics. Two others died tragically. Jean, a 
composer, died at the age of 25, while Louis, a physicist, was 
arrested by the Germans in 1942 and executed after 15 months 
in captivity. 

Cartan added greatly to the theory of continuous groups, 
which had been initiated by Lie. His thesis (1894) contains a major contribution to Lie 
algebras wherein he completed the classification of the semisimple algebras that Killing had 
essentially found. He then turned to the theory of associative algebras and investigated the 
structure for these algebras over the real and complex fields. Wedderburn would complete 
Cartan’s work in this area. 

He then turned to representations of semisimple Lie groups. His work is a striking 
synthesis of Lie theory, classical geometry, differential geometry, and topology, which was 
to be found in all Cartan’s work. He also applied Grassmann algebra to the theory of exterior 
differential forms. 

By 1904 Cartan was turning to papers on differential equations, and from 1916 on 
he published mainly on differential geometry. Klein’s Erlanger Program was seen to be 






744 25. ALGEBRA OF TEWSORS 


inadequate as a general description of geometry by Weyl and Veblen, and Cartan was to play 
a major role. He examined a space acted on by an arbitrary Lie group of transformations, 
developing a theory of moving frames that generalizes the kinematical theory of Darboux. 

Cartan further contributed to geometry with his theory of symmetric spaces, which have 
their origins in papers he wrote in 1926. It develops ideas first studied by Clifford and Cayley 
and used topological methods developed by Weyl in 1925. This work was completed by 
1932. 

Cartan then went on to examine problems on a topic first studied by Poincare. By this 
stage his son, Henri Cartan, was making major contributions to mathematics, and Elie Cartan 
was able to build on theorems proved by his son. 

Cartan also published work on relativity and the theory of spinors. He is certainly one 
of the most important mathematicians of the first half of the twentieth century. 


25.3.1 The Determinant 

One of the most beautiful applications of exterior algebra is in the theory of deter¬ 
minants. We have already considered determinants in detail in Chapter 3, where 
we noted how messy it was to prove some of the theorems. With the machinery of 
exterior algebra at our disposal, we will see how elegant this theory becomes and 
how trivial some of the proofs will turn out to be.. 

First, let us recall that dim = 1 when N = dim V. This means that if 

{e/ is a basis of V ， then ei 八 e 〗 八 •"八 e// is the only vector in the corresponding 

basis of A n (V), On the other hand, if is any set of N vectors, the product 

vi a V 2 A • ■ • A vat is either zero (if the v^s are linearly dependent) or a nonzero 
product belonging to A^^). Since ei A e 2 A • • • A is a basis of A N (V), we 
conclude that for any set of N vectors, vi, V 2 ,..., vjv € V, the product vi A V 2 A 
■ ■ ■ A vat is a multiple (possibly zero) of ei 八 e 2 八 • •. A e 况 . 

Now let A be a linear operator on V. The set of vectors Aei, Ae 2 ,..., Ae^r 
all belong to V, By the above remarks (AeO A • * • A (Aejv) is proportional to 
ei Ae 2 a ■ ■ ■ Ae^. We now show that the proportionality constant is simply det A. 

determinant in terms 25.3.11. Theorem. Let A:V ^ V be linear. Let be a basis for V. Then 
of wedge products 

(Aei) 八 • • • A (ACiv) = (det A)ei a ■ • • a cat - (25.13) 

Proof, Let Ae r = A l r r e/ r ,forr = 1,2,iV. Then 

(Aei) 八…八 (Ae") = ⑷ e “） 八 (A?e/ 2 ) 八 ••• A (A l ^e iN ) 

= _A;l Cfj A C|*2 A • ■ ■ A Cjjy. 

It is straightforward to show that 

a e； 2 a •»• a e ljV = A e 2 A - ■ • A ejv, (25.14) 





Levi-Civita tensor 
and determinants 


oriented basis 
defined 


25.3 EXTERIOR ALGEBRA 745 


so that (Aei) A … A (Ae^v) = (A 1 ^A e 2 a • • • a e^. The 
expression in parentheses on the RHS is simply the determinant as defined in 
Chapter 3 (recall the summation convention). □ 

25.3.12. Example. The symbol called the Levi-Civita tensor ，can be defined 

by Equation (25.14). Consider the linear operator E whose action on a basis is to 

permute the vectors so that Ee^ = . Then on the one hand ， 

(Eej) A • • • A (Eejv) = (det E)ei 八..•八 ejy, 

and on the other hand, by the definition of the action of E, 

(Eei) 八 •.. 八 （ Eejy) — 八 * •八 * 

Comparison of these two equations with (25.14) gives det E = 在此 … 圔 

Since the determinant is basis-independent, the result of the previous example 
can be summarized as follows: 


25.3.13. Box. The Levi-Civita tensor ^i v i 2 .,A N takes the same value in all 
coordinate systems. 


Let us now look at the determinant of the product of two operators A and B. 
By definition ， (ABei) 八 … A (ABe^) = [det(AB)]ei A … A e". However, we 
also have 

(ABei) A …八 (ABe^r) = [A(Bei)] A • ■ 4 A [A(Bejv)] 

=(det A)(Bei) 八 … A (Be 況 ) 

=(det A) (det B)ei A • • ■ A e 况 . 

It follows that det AB = (det A)(detB). Here the power and elegance of exterior 
algebra can be truly appreciated ― to prove the same result in Chapter 3, we had 
to go through a maze of index shuffling and reshuffling. 

253.2 Orientation 

The reader is no doubt familiar with the right-handed and left-handed coordinate 
systems in M 3 . In this section, we generalize the idea to arbitrary vector spaces. 

25,3.14* Definition* An oriented basis of an N-dimensional vector space is an 
ordered collection of N linearly independent vectors. 

If is one oriented basis and is a second one, then 

Ui A U2 a ■ * • A Uat = (det R)Vl 八 V2 A • • ■ 八 Viv ， 

where R is the transformation matrix and det R is a nonzero number (R is invertible), 
which can be positive or negative. Accordingly, we have the following definition. 




746 25. ALGEBRAOFTENSORS 


oriented vector 
spaces defined 


volume element of a 
vector space 


positive orientation 


symplectic form, 
symplectic vector 
space, and 
symplectic 
transformation 


25.3.15. Definition. An orientation is the collection of all oriented bases related 
by a transformation matrix having a positive determinant. A vector space for which 
an orientation is specified is called an oriented vector space. 

Clearly, there are only two orientations in any vector space. Each oriented basis 
is positively related to any oriented basis belonging to the same orientation and 
negatively related to any oriented basis belonging to the other orientation. For ex¬ 
ample, in R 3 , the bases {€^, e^,, e z } and [e y , e^；, e^} belong to different orientations 
because 

The first basis is (by convention) called a right-handed coordinate system, and 
the second is called a left-handed coordinate system. Any other basis is either 
right-handed or left-handed. There is no third alternative! 

25.3.16. Definition. Let V be a vector space. Let V* have the oriented basis 
The oriented volume element fx e A N (V*) ofV is defined, as 

\ 2 N 

fl = € A e A • * • A £ . 

Note that if {e/} is ordered as {c^}, then 卩 (ei, 62 ,.. •, e^) = - and we 
say that {e/} is positively oriented with respect to 弘 .In general, {v^} is positively 
oriented with respect to /x if /a(vi, V 2 ,..., v^) > 0. 

The volume element of V is defined in terms of a basis for V*. The reason for 
this will become apparent later, when we shall see that dx, dy^ and dz forma basis 
for (K 3 )*, and dx dy dz = dx A dy A dz ， 

253.3 Symplectic Vector Spaces 

Mechanics was a great contributor to the development of tensor analysis. It pro¬ 
vided examples of manifolds that went beyond mere subspaces of W 1 , The phase 
space of Hamiltonian mechanics is a paradigm of manifolds that are not (4 hypersur- 
faces” of some Euclidean space. We shall have more to say about such manifolds 
in Chapter 26. Here, we shall be content with the algebraic structure underlying 
classical mechanics. 

25.3.17. Definition. A 2-form u e A 2 ^*) is called nondegenerate if 
o;(vi, V 2 ) = 0 for all \\ e V implies \2 = 0_ A symplectic form on V is a 
nondegenerate 2-form u) e A 2 (V*). The pair (V, u) is called a symplectic vector 
space. If (V, u) and (W, p) are symplectic vector spaces, a linear transforma¬ 
tion T : V ^ W is called a symplectic transformation or a symplectic map if 
T*p = u. 

Any 2-form (degenerate or nondegenerate) leads to other quantities that are 
also of interest. For instance, given any basis {v*} in V, one defines the matrix of 


25.3 EXTERIOR ALGEBRA 747 


The flat \> and sharp 
maps defined 


the 2-form u; € A 2 (V*) by cc>ij = a;(vi, vj). Similarly, one can define the useful 
linear map : V -> V* by 

eM 

[u^(\)]V = w(v ， v'). (25.15) 

gV* 


rank of a symplectic The rank of is called the rank of lj . The reader may check that 
form 


25.3.18. Box. A 2-form a? is nondegenerate if and only if the determinant 
of(p)ij) is nonzero, if and only if is an isomorphism, in which case the 
inverse of ^ is denoted by 


25.3.19. Proposition. Let V be an N-dimensional vector space and to € A 2 (V*). 
canonical basis of a Ifthe rank ofu^isr, thenr = 2nfor some integer n and there exists a basis {e^} ofV ， 

symplectic vector called a canonical basis ofV y and a dual basis {e 7 }, such thatu = ^ =1 A€ J + n , 
space or , equivalently, the N x N matrix ofu is given by 

/° 1 °\ 

(-1 0 0 

\0 0 0/ 

where 1 is then xn identity matrix and 0 is the (N — 2n) x (A^ — 2n) zero matrix. 

Proof. Since u; _ 0, there exist a pair of vectors ei ,e\ eV such that w(ei, e^) ^ 
0. Dividing ei by a constant, we can assume e[) = 1. Because of its anti¬ 
symmetry, the matrix of u; in the plane 7\ spanned by ei and ej is q). Let Vi 
be the u;-orthogonal complement of , i.e., 

Vi = (v e VI w(v, vi) = 0 V vi e T\}. 

Then the reader may check that Ti H Vi = 0. Moreover, V = ?i + Vi because 

V = u;(v, e^ei - w(v, ei)e[ + v - w(v, e^ei + w(v ， ei)ej 

1 - , - ' ' - - - ' 

elPl €Vi( Reader, verify!) 

for any y e V. Thus, V = CPi ® Vi. If w is zero on all pairs of vectors in V\, 
then we are done, and the rank of o; is 2; otherwise, let e 2 , € Vi be such that 

w(e 2 , e^) 0. Proceeding as above, we obtain 

where ?2 is the plane spanned by e 2 , and e; and V 2 its w-orthogonal complement 
inVi，Continuing this process yields 





748 25. ALGEBRA OF TENSORS 


where V n is the subspace of Von which u? is zero. This shows that the rank of u) is 
In, By reordering the basis vectors such that e f k = e rt +fc，we construct a new basis 
{e/ }f =l in which a; has the desired matrix. 

To conclude the proposition, it is sufficient to show that et/ 八 in 

is dual to {e* }^_ 1? has the same matrix as a;. This is left as an exercise 
一 □ 


which 

for the reader. 


We note that in the canonical basis, 


0 if i,j < n, 

^ij = I ^ik ifj=n-hk i k<n, 
I 0 if i > 2n or j > 2n. 


If we write v € V as v = E?=i(^ e « + + in the 

canonical basis of V, with a corresponding expansion for v、then the reader may 
verify that 

n 

w(v, V) = ~ 咖) • 

i=\ 


The following proposition gives a useful criterion for nondegeneracy of u: 

25.3«20. Proposition. Let oj be a 2-form in the finite-dimensional vector space V. 
Then u> is nondegenerate iffV has even dimension, say 2n t and a; w = cj A - * ■ A u; 
is a volume element ofV. 


Proof. Suppose u) is nondegenerate. Then, is an isomorphism. Therefore, the 
rank of w, an even number by Proposition 25.3.19, must equal dimV* = dim V. 
Moreover, by taking successive powers of a; and using mathematical induction, 
one can show that u/ 1 is proportional to e 1 八 • • • A e 2n . 

Conversely, if iJ 1 cx e 1 八…八 e 2n is a volume element, then by Proposition 
25.3.8, the {e-^} are linearly independent. Furthermore, dim V* must equal the 
number of linearly independent factors in the wedge product of a volume element. 
Thus, dim V* = 2n. But 2« is also the rank of a;. It follows thata; b is onto. Since Vis 
finite-dimensional, the dimension theorem implies that is an isomorphism. □ 


25.3.21, Example. Let V be a vector space and V* its dual. The direct sum V ㊉ V* can 
be turned into a symplectic vector space if we define w e A 2 (V ㊉ V*) by 


+ p，V + (p f ) = - ip(V )， 


where eV and (p, <p f e V*. The reader may verify that (V © V* ， u?) is a symplectic 
vector space. This construction of symplectic vector spaces is closely related to Hamiltonian 
dynamics, to which we shall return in Chapter 26. M 


25.4 INNER PRODUCT REVISITED 749 


symplectic group 


symplectic matrices 


symmetric bilinear 
form 


Suppose (V, (jS) and (W, p) are 2n-dimensional symplectic vector spaces. 
Then, fay Proposition 25.3.7, any symplectic map T : (V, a?) — (W, p) is volume- 
preserving, i.e., (J*p) n is a volume element of W. It follows that the rank of T* 
is 2n, and by Proposition 1.3.18, so is the rank of T. The dimension theorem now 
implies that T is an isomorphism. Symplectic transformations on a single vector 
space have an interesting property: 

25.3.22* Proposition. Let (V, a;) be a symplectic vector space. Then the set of 
symplectic mappings T : (V ， w) (V, a;) forms a group under composition, 
called the symplectic group and denoted by Sp(V, cj). 

Proof. Clearly, Sp(V, u) is a subset of GL(V). One need only show that the inverse 
of a symplectic transformation is also such a transformation and that the product 
of two symplectic transformations is a symplectic transformation. The details are 
left for the reader. 口 

A matrix is called symplectic if it is the representation of a symplectic transfor¬ 
mation in a canonical basis of the underlying symplectic vector space. The reader 
may check that the condition for a matrix A to be symplectic is A f JA = J, where J 
is the representation of w in the canonical basis: 

where 1 and 0 are the n x n identity and zero matrices, respectively. 


25.4 Inner Product Revisited 

The inner product was defined in Chapter 1 in terms of a metric function that took 
two vectors as input and manufactured a number. We now know what kind of 
machine this is in the language of tensors. 

25.4.1. Definition. A symmetric bilinear form Q on V is a symmetric tensor of 
type (0, 2). 

If {e；}^ =1 is a basis of V and {e 1 is its dual basis, then g = gije 1 (recall 

Einstein’s summation convention), because e l e-^ = e l ^ + 0 e l forma basis 

of § 2 (^)* For any vector v € V, we can write 

g(v) = gije l e j (\) = gije l e j (v k ek) = v k g”e l e j (ek) = gijV j e l . 

(25 - 16) 

Thus, g(v) € V*. This shows that g can be thought of as a mapping, g : V — 
V*，given by Equation (25.16). For this equation to make sense, it should not 
matter which factor in the symmetric product v contracts with. But this is a trivial 



750 25. ALGEBRA OFTENSORS 


consequence of the symmetries g/j = gji and e* e] = e 】 e l • The components gijv^ 
of g(v) in the basis of V* are denoted by u/, so 

■ » 

g(v) = vW ， where Vi = gijv } . (25.17) 

We have thus lowered the index of by the use of the symmetric bilinear form 
g. In applications is uniquely defined; furthermore, there is a one-to-one cor¬ 
respondence between u/ and v\ This can happen if and only if the mapping 
g : V ^ V* is invertible, in which case there must exist a unique g 一 1 : V* V, 
org -1 g § 2 (V*) = S 2 (V), such that 

v j ej = v = g -l g(v) = g _1 (u/e f ) = 

=Vi [te -1 ) 7 * e^efc] (e l ) = v t eye^V) = u/ {g~ l ) jl e^j. 

= s i 

Comparison of the LHS and the RHS yields v J = Vi It is customary to 

omit the — 1 and simply write 

v J = g Ji v i9 (25.18) 

where it is understood that g with upper indices is the inverse of g (with lower 
indices). 

nondegenerate 
bilinear forms and 
inner products 


raising and lowering 
indices 


g) = gikg kj = (g~ l ) lk gkj = 8). (25.19) 

This relation holds, of course, in all bases. 

The inner product has been defined as a nondegenerate symmetric bilinear 
form. The important criterion of nondegeneracy has equivalences: 


25.4.2. Definition. An invertible bilinear form is called nondegenerate. A sym¬ 
metric bilinear form that is nondegenerate is called an inner product. 

We therefore see that the presence of a nondegenerate symmetric bilinear form 
(or an inner product) naturally connects the vectors in V and V* in a unique way. 
Going from a vector in V to its unique image in V* is done by simply lowering the 
index using Equation (25.17), and going the other way involves using Equation 
(25.18) to raise the index. This process can be generaHzed to all tensors. For 
instance, although in general, there is no connection among Tq(V), ^(V), and 
U^(V), the introduction of an inner product connects all these spaces in a natural 
way and establishes a one-to-one correspondence among them. For example, to a 
tensor in Tq(V) with components t 1 ^ there corresponds a unique tensor in (V), 
given, in component form, by t l . = gjkt ik , and another unique tensor in 0^(V )， 
given by tij ^ gut l j = gugjkt lk - 

Let us apply this technique to g lJ \ which is also a tensor and for which the 
lowering process is defined. We have 


25.4 INNER PRODUCT REVISITED 751 


^-orthogonal and 
null vectors 


25.4.3. Proposition. A symmetric bilinear form g is nondegenerate if and only if 

1. the matrix of components gij has a nonvanishing determinant, or 

2 、 for every nonzero v e V, there exists w € V such that g(v, w) ^ 0. 

Proof. The first part is a direct consequence of the definition of nondegeneracy. 
The second part follows from the fact that g : V ^ V* is invertible iff the nullity 
of g is zero. It follows that if v e V is nonzero, then g(v) ^ 0, i.e., g(v) is not the 
zero functional. Thus, there must exist a vector w G V such that [g(v)](w) 一 0. 
The proposition is proved once we note that [g(v)](w) = g(v, w). □ 

25.4.4. Definition. A general (not necessarily nondegenerate) symmetric bilinear 
form g can be categorized as follows: 

L positive (negative) definite: g(v, v) > 0 [g(v, v) < 0]for every nonzero 
vector v; 

2. definite: g is either positive definite or negative definite; 

3. positive (negative) semidefinite: g(v ， v) > 0 [g(v, v) < 0]for every y; 

4. semidefinite: g is either positive semidefinite or negative semidefinite; 

5. indefinite: g is not definite. 

25.4.5. Example. Some of the categories of the definition above can be illustrated in M 2 
with V! = (x 1: yi), v 2 - (x 2i andv = (jc,y). 

(a) Positive definite: g(vi ， V 2 ) = x \%2 + y\y 2 because if v ^ 0, then one of its components 
is nonzero, and g(v ， v) = x 2 + ;y 2 > 0. 

(b) Negative definite: g(vi ， V 2 ) = + xjyi) — x\X 2 - y\yi because g(v, v)= 

xy —x 2 — y 2 = —\{x — y) 2 — ^x 2 — 士 ; y 2 , which is negative for nonzero v. 

(c) Indefinite: g(vi ， V 2 > = x\X 2 — y\yi- For v = (x, x), g(v ， v) = 0. However，g is 
nondegenerate, because it has the invertible matrix g = ( q ^) in the standard basis of M 2 . 

(d) Positive semidefinite: g(vi ， V 2 ) = x\X 2 => g(v, v) = x 1 and g(v, v) is never negative. 

However, g is degenerate because its matrix in the standard basis ofM 2 is g = (q §), which 
is not invertible. 圏 

Two vectors u, v e Vare said to be ^-orthogonal ifg(u, v) = 0. A null vector 
of g is a vector that is g-orthogonalto itself. If g is definite, then the only null vector 
is the zero vector. The converse is also true, as the following proposition shows. 

25.4.6. Proposition. Ifg is not definite，then there exists a nonzero null vector. 

Proof. That g is not positive definite implies that there exists a nonzero vector 
v 6 V such that g(v, v) < 0. Similarly, that g is not negative definite implies that 
there exists a nonzero vector w € V such that g(w, w) > 0. Construct the vector 
u = av + (1 — a)w and note that g(u, u) is a continuous function of a. For a = 0 
this function has the value g(w, w) > 0, and fora = 1 it has the value g(v, v) < 0. 
Thus, there must be some a for which g(u, u) = 0. □ 


752 25. ALGEBRA OF TENSORS 


^-orthonormal 
vectors, diagonal 
components of g r 
index and signature 
of g 


polarization identity 


25.4.7. Example. In the special theory of relativity, the inner product of two “position” 
four-vectors, ri = Cq, 乃 ，之 i ， ct\) and r 〗 = (x 2 ^ y% Z 2 -> ^2)> where c is the speed of light ， 
is defined as 

g(n,r 2 ) = -xix 2 - y\yi - + c 2 他 

This is clearly an indefinite symmetric bilinear form. Proposition 25.4.6 tells us that there 
must exist a nonzero null vector. Such a vector r satisfies 

g(r, r) = c 2 t 2 - x 2 - y 2 - z 2 = 0, 

， jc 2 + y 2 4- Z 2 • ^ 2 4-y 2 + ? . distance 

or c 2 ―--- ^ c = 士 ^ -= 士 - . This corresponds to a 

t time 

particle moving with the speed of light. Thus, light rays are the null vectors in the special 
theory of relativity. — 

Whenever there is an inner product on a vector space, there is the possibility 
of orthogonal basis vectors. However, since g(v, v) is allowed to be negative or 
zero, it is impossible to demand normality for some vectors. 

25.4.8. Definition* A basis { 灼 }^^ of V is g-orthonormal f/g(e/,e y ) — Ofor 

i + j，and g(ei,ei)(no sum!) The g(ei t ei)are called the diagonal 

components ofq. We use n_|_, and no to denote the number of vectors e/ for 
which g(e r -, e*) is, respectively, +1, —1, orO. The integer n- is called the index of 
g ，and s = n^. ^ n- is called the signature of g. - 

The existence of orthonormal bases was established for positive definite g 
by the Gram-^Schmidt orthonormalization process in Chapter 1. One of the steps 
in this process is division by (the square root of) g(v, v), which could be zero 
in the present context. A slight modification of the Gram — Schmidt process, in 
conjunction with the procedure used in the symplectic case, can help construct an 
orthonormal basis for a general (possibly indefinite) g. To be specific, start with 
the polarization identity, 

g(v, V) = |[g(v + V， v + v 7 ) — g(v 一 v’，v — v ’)]， 

and use it to convince youi self that g is identically zero unless there exists a vector 
ei such that g(ei ， ei) _ 0. Rescaling, we can assume that g(ei, ei) = rj\ = 士 1. 
Let Vi be the span of ei and V 2 the g-orthogonal complement of V\. Clearly, 
Vi fl V 2 = {0}; moreover, Vi + V 2 = V, for if v e V, we can write 

v = J?ig(v, ei)ei + v — ?ng(v ， ei)ei • 

Now, if g = 0 on V 2 , we are done; otherwise, there is a vector e 2 such that 
g(e 2 , e 2 ) = rj 2 = ±1. We continue inductively to complete the construction. We 
therefore have the following: 


25.4 INNER PRODUCT REVISITED 753 


Sylvester’s theorem 


Orthonormal bases 
give the same 
oriented volume 
element. 


25.4.9. Theorem. For every symmetric bilinear form g on V, there is an orthonor¬ 
mal basis. Furthermore; n+, and no are the same in all orthonormal bases. 

For a proof of the last statement, also known as Sylvester’s theorem, see 
[Bish 80, p, 104]. Qrthononnal bases allow us to speak of the oriented volume 
element. Suppose {€^}^ =1 is an oriented basis of V*. If another or¬ 

thonormal basis in the same orientation and related to {e)} by a matrix R，then 


(p l A <p 2 A-^ A (p N = (det FOe 1 A e 2 A • • • A e N . 

Since {(p k ) and {e^} are orthonormal, the determinant of g is (—1) K - in both of 
them. Problem 25.17 then implies that (detR) 2 = 1 or det R = 土 1. However, 
and {e^} belong to the same orientation. Thus, detR = +1， and {(p k ) and 
{e^} give the same volume element. 

25.4.10. Example* Let V = M 3 and Vi = (x\, yi.zi), V 2 = yi^ Z 2 ), and v = 
(x y y, z). Define the symmetric bilinear form 

g(vi ， v 2 ) = j(xiy 2 -\- x 2 y\ +: yiZ 2 + yiz\ +x\Z 2 + x 2 z\) 


so that g(v, v) = We wish to find a set of vectors in M 3 that are orthonormal with 

respect to this bilinear form. Clearly, ei — (1,1,0)is such that g(ei, ei) = l.Soei is one of 
our vectors. Consider v =： (1,0,1) and note that the vector V 2 = v—[g(v, ei)/g(ei ， ei)]ei ， 
suggested by the Gram-Schmidt process, is orthogonal to e^. Furthermore, it is easily 
verified that g(r 2 , 1 * 2 ) = Therefore, our second vector is 

r 2 = /J_3_ 丄、 

Vl9(i*2’r2)| 、 a/5 ， V5, 
with g(e 2 , ^ 2 ) == —1. Finally, we take w = (0,1,1). Then 


r3=w —^^ ei — _?^ e2 = 
9(ei ， ei) g(e 2 ， e 2 ) 


^(— 3 , 1 , 1 ) 


will be orthogonal to both and 句 with g(r 3 , 矿 3 ) = — 聋 . Thus, the third vector can be 
chosen to be 


p - r 3 一 (_1_ ± ±\ 

3 Vig( r 3 ， r 3)l ^ 41’ 41、 41、' 

and we obtain g(ei ， ei) = 1, g(e 2 , 巧） =— 1 ， g(e 3 ， e 3 ) = -1, g(e*,ey) = Ofor 1 ^ j. 
We also have n+ = 1 ， = 2, and no = 0, i.e” the index of g is 2, and its signature is — 1. 
Although we have worked in a particular basis. Theorem 25.4.9 guarantees that «+，《_， 
and no are (orthononnal) basis-independent. Q 


We should emphasize that the invariance of «+, and no is true for g- 
orthonormal bases. As a counterexample, consider g of Example 25.4.10 applied 
to the standard basis of R 3 , which we designate with a prime. It is readily verified 
that 


= 0 for i = 1,2,3. 



754 25. ALGEBRA OF TENSORS 


permutation tensor 


So it might appear that no = 3 for this basis. However, the standard basis is not 
g-orthonormal. In fact, 

g(ei ， e f 2 ) = j = 9(61,63) = g(e r 2 , e f 3 ). 

That is why the nonstandard vectors ei ， v，and w were chosen in Example 25.4.10. 

In an orthonormal basis the matrix of g is diagonal, with entries +1 ， 一 1，and 
0. In particular, if g is to be nondegenerate, that is, to be an inner product, no must 
be zero. Thus, a general inner product on an A^-dimensional vector space satisfies 
the conditions n+ + W and n^. — ft- = s which give s = N — 2w_. 

25.4*11_ Definition. An inner product space with n- = 1 orri- = N — l is called 
a Minkowski space. For N = A this is the space of the special theory of relativity. 
An inner product space with n^. = 0 (or ti- = N) is called a Euclidean space. 

25.4.12. Example. Let {e I *}^ 1 be a basis of V and its dual. We can define the 

permutation tensor: 


d hj2"，jN 


Ae i2 A-^Ae iN (e jv ej 2 .e^). 


(25.20) 


« m » 

It is clear from this definition that is completely skew-symmetric in all upper 

indices. That it is also skew-symmetric in the lower indices can be seen as follows. Assume 
that two of the lower indices are equal. This means having two e,s equal in (25.20). These 
two ey ’s will contract with two ’s，say e k and e. Thus, in the expansion there will be a 
term Ce*(e 7 )€ / (ej), where C is the product of all the other factors. Since the product is 
completely skew-symmetric in the upper indices, there must also exist another term, with a 
minus sign and in which the upper indices k and / are interchanged: —Ce^ (e^)e^(ey). This 
makes the sum zero, and by Theorem 25.2.7, (25.20) is antisymmetric in the lower indices 
as well. 

This suggests that 各 1 } 1 】… 1 ? cx € ill2m " lN €j { y 2 _ To find the proportionality constant, 

J 1 JJ Ll 

we note that (see Problem 25.16) 


、 n“.N 


h ⑴兀 (2) … jr(jv) 4 ⑴磋 (2) … 


n 


The only contribution to the sum comes from the permutation with the property n{i) 

This is the identity permutation for which € n = 1. Thus, we have 占 } 2 … g = 1. On the other 
hand, 

A2...N 


石 12 2 


(—l) 71 -. 

Therefore, the proportionality constant is (—l) n ~. Thus 




卜 DU 


圈 


We can find an explicit expression for the permutation tensor of Example 
25.4.12. Expanding the RHS of Equation (25.20) using Equation (25.10), we obtain 


S hj2.'X = ' 


JT 


^ nilmmdNe hh-jN = 


(- ㈤… 々 ㈤ 


jr 


(25.21) 


25.4 INNER PRODUCT REVISITED 755 


Furthermore, the first equation of (25.21) can be written concisely as a deter¬ 
minant. To see this, first note that 

71 

The RHS is clearly the determinant of a matrix (expanded with respect to the ith 
row) whose elements are The same holds true if 1,2,..., is replaced by 
7i > / 2 »• ■ • > ]n\ thus ， 


只 ilh … iN 
0 j\h*"jN 



• • ■ 



(25.22) 


25.4.13. Example. Let us apply the second equation of (25.21) to Euclidean K 3 : 

= 中 U- Sjsisi - 8i n S{S k n + + 心 /4 一 4 心 f. 

From this fundamental relation，we can obtain other useful formulas. For example, setting 
n = k and summing over fc，we get 

^ k Hrnk = 3SjS‘ — Spit - 38 l m si + 5 ^ 5 / + 8 l m S{ - SjS^ = - S l m sj. 

Now set m = j m this equation and sum over j: 

€ ijk €i jk = 35 ； - 5j = 2Sj. 

Finally, let l = i and sum over i : 

€ ijk e ijk = 25?2 • 3 = 3!. 

In general, 

批 … iN € hi2 … iN = (-l) n -NU 

心〜 f" 2 “i 山 = (-D n ~(N^ l)\sf N ， 

和 hjN _ ―一洳 = (一 w _ 2) !(《a 一 d 一) , 

and so on. ^ 


25.4.14. Example. As an application of the foregoing formalism, we can express the 
determinant of a 2 x 2 matrix in terms 
Then 


of traces. Let A be such a matrix with elements Ay. 


detA = €ijA\A J 2 = i (句 

= - 供 ) = 1(4^ - A W) 

= 士 [(tr A)(tr A) - (A 2 );:]= 圣 [_ 2 — tr(A 2 )]. 


圃 



756 25. ALGEBRA OF TENSORS 


Hodge star 叩 erator 


We can generalize the result of the example above and express the determinant 
of an x iV matrix as 


detA 


N\ 


N\ 


€ 






3U2 …] N 




• — A J ： 


JN 

IN 




7 T 


(25.23) 


25.5 The Hodge Star Operator 

It was established in Chapter 2 that all vector spaces of the same dimension are 
isomorphic (identical). Therefore, the two vector spaces A P (V) and A N ~ P (V) 
having the same dimension, must be isomorphic. In fact, there is a 

natural isomorphism between the two spaces: 

25.5.1. Definition. Let g be an inner product and an ordered g- 

orthonormal basis of V. The Hodge star operator is a linear mapping, * : 
AP(V) A n ~ p (V), given by (remember Einstein’s summation convention!) 

* (e fl A - • • a e /p ) = 沖 C'p+i 八…八〜， （25.24) 

where all the indices of the € tensor are raised by . 

Although this definition is based on a choice of basis, it can be shown that the 
operator is in fact basis-independent. We note that = 士句 i ...ipip^.\..,iN be¬ 

cause each index raising introduces either a+1 ora —1. In particular, for Euclidean 
spaces, in which «_ = 0, the two epsilon symbols are the same. 

25.5.2. Example. Let us apply Definition 25.5.1 to AP(R 3 ) for p = 0,1,2,3. Let 
{ei, C 2 , 63 } be an oriented ortbonormal basis oflR 3 . 

⑻ For A°(M 3 ) =Ma basis is 1, and (25.24) gives 

*1 = a ey a = ei a e 2 A e 〗， 

(b) For A 1 ^ 3 ) = R 3 a basis is {ei, 62,63}, and (25.24) gives * 句 = 士 e 产 ej A 印， or 
*ei = e 2 a 63 , *e 2 = 63 a ej, *e 3 = ei a e 2 - 

(c) For A 2 (R 3 ) a basis is {ei A e 2 , ei Ae 3 ,e 2 A e 3 }, and (25.24) gives A ej = 

or *(ej a e 2 > = 63 , *(ei 八 e 〗） =-e 2 , *(0 八 A) = ei. 

⑹ For A 3 (E 3 ) a basis is (ei A e 2 A 63 }, and (25.24) yields 

*(ei 八句 Ae 3 ) =€123 = 1. 圈 

The preceding example may suggest that applying the Hodge star operator 
twice (composition of * with itself, or * o *) is equivalent to applying the identity 
operator. This is partially true. The following theorem is a precise statement of this 
conjecture. (For a proof, see [Bish 80, p. 111].) 



25.5 THE HODGE STAR OPERATOR 757 


antisymmetric 
tensors with 
numerical 
coefficients 


Cross product is 
defined only in three 
dimensions! 


25.5.3* Theorem. Let V be an oriented space with an inner product g. For A g 
A P (V), we have 


*o*Ae**A = (-l) n ‘(-l)〆 汉-妁 A, (25.25) 

where rt- is the index ofg and N = dimV. 

In particular, for Euclidean spaces with an odd number of dimensions (such as 
R 3 ), * * A = A. 

One can extend the star operation to any A € A 歹 (% by writing A as a linear 
combination of basis vectors of A P (V) constructed out of and using the 

lineality of 

In the discussion of exterior algebra one encounters sums of the form 
A 11 … A---AV^. 


It is important to keep in mind that … 1 V is assumed skew-symmetric. For ex¬ 
ample, if A = ei A e 2 , then in the sum A = A ey, the nonzero components 
consist of A 12 — \ and A 21 = —Similarly, when B = ei A e 2 A e 3 is written in 
the form B — B ljk ei Aej A e^, it is understood that the nonzero components of B 
are not restricted to 5 12 . Other components, such as B 132 , 5 231 , and so on, are 
also nonzero. In fact, we have 


B l23 = ~B 132 = -B 213 = B 231 = B 312 = -B 321 = i 

6 • 

This should be kept in mind when sums over exterior products with numerical 
coefficients are encountered. 


25,5*4. Example. Let a, b € M? and (ei, e 2 , © 3 } an oriented orthonormal basis of 
Then a = a 1 ^ and b = b^ej. Let us calculate a A b and *(a A b). We assume a Euclidean 
g onlR^. Then a A b = (a z e/) A (b^ej) = d W e/ A e j, and 

*(a Ab) = *(^e/) A (b j ej) = a l b j Aej) = 

We see that *(a A b) is a vector with components [* (a 八 b)]* = ejja l b^, which are precisely 
the components of a x b. 

The correspondence between aAb and a x b holds only in three dimensions, because 
dim A 1 (V) = dim A 2 (V) only if dim V = 3 • That is why the cross product can be defined — 
as a “machine” that takes two vectors in V and manufactures a vector in V ~only if V is 
three-dimensional. _ 

25.5,5. Example. We can use the results of Examples 25.4.13 and 25.5.4 to establish a 
sample of familiar vector identities componentwise. 

(a) For the triple cross product, we have 

[a x (b x c)] k = x C y = = ai b l ^e ki U nm 

= ^ l c m ^ kij € lmj = aib l c m (^8\ n - 4 ^) 

= aib k c l - a(b l c k = (a- c)b k - (a ■ b)c k . 



758 25. ALGEBRA OFTENSORS 


which is the A:th component of b(a - c) - c(a - b). In deriving the above “bac cab” rule, we 
used the fact that one can swap an upper index with the same lower index: a 1 bi — ai b l 
(b) Next we show the familiar statement that the divergence of curl is zero. Let 3,- denote 
differentiation with respect to _ Then 

▽ . (V x a) = ^(V x a); = a k = e^ k didja k 

-^^didjafe = — €^ lk djdiafc = 

= -dj(V xa) j = -V * (V x a) V ■ (V x a) = 0. 

The above steps show in general that 


25.5.6. Box. When the product of two tensors is summed over a pair of indices in 
which one of the tensors is symmetric and the other antisymmetric，the result is zero. 


(c) Finally, we show that curl of gradient is zero: 
fV x (Wf)] k = e) k dh k f = € ijk djd k f = 0, 

because 争 antisymmetric in jk f while Bjdkf i s symmetric in jk. 


m 


25.6 Problems 

25.1. Show that the mapping v : V* — M given by v(r) = r(v) is linear. 

25.2* Show that the components of a tensor product are the products of the com¬ 
ponents of the factors: 

25.3. Show that e 7l (8) • ■ ■ <8) ej r (8) 6 il (8) • ■ ■ 0 e is are linearly independent. Hint: 
Consider O 力 0 «■ • (8> e； r <S> (g> • • • 0 e is =0 and evaluate the LHS on 
appropriate tensors to show that all coefficients are zero. 

25.4. What is the tensor product of A = 2e x —e y + 3e z with itself? 

25.5. If A e L(V) is represented by in the basis {e*} and by Af in {<}，then 
show that 

Afe^ <g> e n = A^ei <g) e j , 

where {e^} and {e fl } are dual to {e/} and {e^}, respectively. 

25.6. Prove that the linear functional F : V -> M is a linear invariant, i.e., basis- 
independent, function. 



25.6 PROBLEMS 759 


25*7. Show that tr : Tj Mis an invariant linear function. 

25.8. If A is skew-symmetric in some pair of variables, show that §(A) = 0. 

25.9. Using the exterior product show whether the following three vectors are 
linearly dependent or independent: 

vi = 2ei - e2 + 3e3 — e4, 

V 2 = —ei + 3 句 一 2e4, 

V 3 = 3ei 4- 2©2 — 4e3 + 

25*10. Let A 6 % (V) be skew-symmetric. Show that if r 1 ,..., r r e V* are 
linearly dependent, then A(r x ，…， r r ) = 0. 

25.11. Show that {e^ A e*} with < i are linearly independent. 

25.12. Let v € V be nonzero, and let A € A P (V). Show that v A A = 0 if and only 
if there exists B G A p- 1 (V) such that A = v A B. Hint: Let v be the first vector of 
a basis; separate out v in the expansion of A in terms of the p-fold wedge products 
of basis vectors, and multiply the result by v. 

25.13. Let A e A 2 (V) with components A ij . Show that A A A = 0 if and only if 
A^A kl - A ik AJ l 4 - A il AJ k = 0 for all /, y,A:,/inany basis. 

25.14. A linear operator acting on a vector in one dimension simply multiplies 
that vector by some constant. Show that this constant is independent of the vector 
chosen. That is, the constant is an intrinsic property of the operator. 

25.15. Let {ei, e 2 , e〗} be any basis in M 3 . Define an operator E : E 3 R 3 
that permutes any set of three vectors {vj, V 2 , V 3 } to {v/, vy, Vjt}. Find the matrix 
representation of this operator and show that det E = 邮 • 

25.16 . ⑻ Starting with the definition of the permutation tensor 5 ^* and 
writing the wedge product in terms of the antisymmetrized tensor product, show 
that 

7T 

(b) Show that the inverse of the (diagonal) matrix of g in an orthonormal basis is 
the same as the matrix of g. 

⑹ Now show that e l2 - N = (—l) ra_ 6 i 2 ...Ar = (—l) rt -. 

25*17. Let be a ^-orthonormal basis of V. Let 叫 = 士 ^ be the matrix of 

g in this orthonormal basis. Let be another (not necessarily orthonormal) 

basis of V with a transformation matrix R. Using G to denote the matrix of g in 
{\j} 9 show that 

detG = det7/(detR) 2 = (-l) n ~(det R) 2 . 


760 25. ALGEBRA OF TENSORS 


In particular, the sign of this determinant is invariant. Why is detG not equal 
to det Is there any conflict with the statement that the determinant is basis- 
independent? 

25.18. Show that the kernel of g : V — V* consists of all vectors u e V such that 
g(u ， v) = 0 for all v € V. Show also that in the g-orthonormal basis {e；}, the set 
{e/ I g(e/ ， 灼 ） = 0} is a basis ofkerg, and therefore no is the nullity of g. 

25.19. Use Equation (25.23) to show that for a 3 x 3 matrix A, 

det A = ^[(trA) 3 -3trAtr(A 2 ) + 2tr(A 3 )]. 

25.20. Show that a 2-form u is nondegenerate if and only if the determinant of 
{(Dij) is nonzero if and only if u; b is an isomorphism. 

25.21. Let V be a finite-dimensional vector space and u e A 2 (V*). Suppose there 
exist a pair of vectors ei ， e; € V such that w(ei ? e^) ^ 0. Let be the plane 
spanned by ei and and V\ the u?-orthogonal complement of Ti. Show that 
% n = 0, and that v - u?(v, e^ei + u;(v, eOe^ is in Vi. 

25.22. Show that ^ A e^' +n ,in which is dual to {e^ the canon¬ 

ical basis of V, has the same matrix as u. 

25*23. Suppose that \,\ f eV are expressed in a canonical basis of V with coeffi¬ 
cients { jc /, ytyZi) and{^, y^z^}. Show that 

n 

w(v, V) = 一 jcjyi). 

i=i 

25.24. Let V be a vector space and V* its dual. Define a; e A 2 (V ㊉ V*) by 

CJ(V + v?, V 7 + ^) = ^(V) - w(v') 

where v, v ; e V and cp, (p f e V*. Show that (V ㊉ V* ， a;) is a symplectic vector 
space. 

25.25. By taking successive powers of cj show that 

^ €力 A e h+n A^-A€ jk a e jk+n ■ 
h … jk=i 

Conclude that 

cj n =nl(-l) [n/2] e 1 A--Ae 2n , 
where [n/2] is the largest integer less than or equal to n/2. 


25.6 PROBLEMS 761 


25.26. Show that the condition for a matrix A to be symplectic is A r JA = J where 
J = (3) is the representation of u? in the canonical basis. 

25.27. Show that Sp(V, w) is a subgroup of GL(V). 

25.28. Find the index and the signature for the bilinear form g on M 3 given by 
9(vi, V 2 > == x\y 2 -\-X 2 yi - y\zi - : V 2 Z 1 . 

25.29* In relativistic electromagnetic theory the current J and the electromagnetic 
field tensor F are, respectively, a four-vector 6 and an antisymmetric tensor of rank 
2. That is, J =： J k ek and F = F l ^et Ae 7 -. Find the components of * J and *F. Recall 
that the space of relativity is a 4D Minkowski space. 

25.30. Show that where there is a sum over an upper index and a lower index, 
swapping the upper index to a lower index, and vice versa, does not change the 
sum. In other words, A j Bi = AiB\ 

25.31. Show the following vector identities, using the definition of cross products 
in terms of € 啡 . 

(a) A x A = 0. 

⑻ V ■ (A x B) = (V x A) ■ B — (V x B) ■ A_ 

(c) V x (A x B) = (B ■ V)A + A(V • B) - (A • V)B — B(V • A). 
id) V x (V x A) = V(V - A) - V 2 A. 

25.32. A vector operator V is defined as {V 1 ， V 2 , V 3 }， a set of three opera¬ 

tors, satisfying the following commutation relations with angular momentum: 
[V% J 7 ] = Show that commutes with all components of angular 

momentum. 


25.33. The Pauli spin matrices 



describe a particle with spin j in nonrelativistic quantum mechanics. Verify that 
these matrices satisfy 

[a 1 , cr J ] = crV ; — o^o 1 = 2i€ l ^a k , [a\cr j ] = — 2 ^- 12 , 

where 1 2 is the unit 2x2 matrix. Show also that =U 1 》 <y k + 5^12, and for 
any two vectors a and b, {a - a)(<r • b) = a. bl 2 - (a x b). 

25.34. Show that any contravariant tensor of rank two can be written as the sum of 
a symmetric tensor and an antisymmetric tensor. Can this be generalized to tensors 
of arbitrary rank? 


6 It turns out to be more natural to consider J as a 3-foim. However, such a fine distinction is not of any consequence for the 
present discussion. 


762 25. ALGEBRA OFTENSORS 


Additional Reading 

1. Abraham, R., Marsden, J. and Ratiu, T. Manifolds，Tensor Analysis，and 
Applications, 2nd ed” Springer-Verlag, 1988. A comprehensive textbook on 
tensors with many examples drawn from physics. 

2. Bishop, R. and Goldberg, S. Tensor Analysis on Manifolds, Dover, 1980. 
This little and lucid book is one of tiie earliest ones on index-free tensor 
analysis. 

3. Flanders, H. Differential Forms with Applications to Physical Sciences, 
Dover, 1989. One of the first books on exterior algebra written for physicists. 
It has many examples drawn from various areas of physics. 



26 __ 

Analysis of Tensors 


Tensor algebra deals with lifeless vectors and tensors — objects that do not move, 
do not change, possess no dynamics. Whenever there is a need for tensors in 
physics, there is also a need to know the way these tensors change with position 
and time. Tensors that depend on position and time are called tensor fields and are 
the subject of this chapter. 

In studying the algebra of tensors, we learned that they are generalizations of 
vectors. Once we have a vector space V and its dual space V*, we can take the 
tensor products of factors of V and V* and create tensors of various kinds. Thus, 
once we know what a vector is, we can makeup tensors from it. 

In the previous chapter, we did not concern ourselves with what a vector was; 
we simply assumed that it existed. Because all the vectors considered there were 
stationary, their mere existence was enough. However, in tensor analysis, where 
things keep changing from point to point (and over time), the existence of vectors 
at one point does not guarantee their existence at all points. Therefore, we now 
have to demand more from vectors than their mere existence. Tied to the concept 
of vectors is the notion of space, or space-time. Let us consider this first. 

26.1 Differentiable Manifolds 

Space is one of the undefinables in elementary physics. Length and time intervals 
are concepts that are “God given，’’ and any definitions of these concepts will be 
circular. This is true as long as we are confined within a single space. In classical 
physics, this space is the three-dimensional Euclidean space in which every motion 
takes place. In special relativity, space is changed to Minkowski space-time. In non- 
relativistic quantum mechanics, the underlying space is the (infinite-dimensional) 
Hilbert space, and time is the only dynamical parameter. In the general theory 



764 26. ANALYSIS OF TENSORS 


differentiable 
manifold 
provisionally defined 


coordinate functions 
and charts 


e°°"related charts 
and atlases 


of relativity, gravitation and space-time are intertwined through the concept of 
curvature. 

Mathematicians have invented a unifying theme that brings the common fea- 
tures of all spaces together. This unifying theme is the theory of differentiable 
manifolds. A rigorous understanding of differentiable manifolds is beyond the 
scope of this book. However, a working knowledge of manifold theory is surpris¬ 
ingly simple. Let us begin with a crude definition of a differentiable manifold. 

26.1,1* Definition. A differentiable manifold is a collection of objects called 
points that are connected to each other in a smoothfashion such that the neighbor¬ 
hood of each point looks like the neighborhood of an m-Jimensional (Cartesian) 
space; m is called the dimension of the manifold. 

As is customary in the literature, we use “manifold” to mean “differentiable 
manifold.” 

26»1«2. Example. The following are examples of differentiable manifolds. 

(a) The space is an «-dimensional manifold. 

(b) The surface of a sphere is a two-dimensional manifold. 

(c) A torus is a two-dimensional manifold. 

(d) The collection of all n x n real matrices whose elements are real functions having 
derivatives of all orders is an n ^-dimensional manifold. Here a point is an « x w matrix. 

(e) The collection of all rotations in TR 3 is a three-dimensional manifold. (Here a point is a 
rotation.) 

(f) Any smooth surface in R 3 is a two-dimensional manifold. 

(g) The unit w-sphere S n , which is the collection of points in E n+1 satisfying 

X 1 +••■+4+1 = L 

is a manifold. 

Any surface with sharp kinks, edges, or points cannot be a manifold. Thus, neither 
a cone nor a finite cylinder is a two-dimensional manifold. However, an infinitely long 
cylinder is a manifold. 豳 

Let Up denote a neighborhood of P. When we say that this neighborhood 
looks like an m -dimensional Cartesian space, we mean that there exists a bijective 
map cp : Up R m from a neighborhood Up of P to a neighborhood (p(Up) 
of (p(P) in R m , such that as we move the point P continuously in Up, its image 
moves continuously in (p(Up). Since <p(P) e R m , we can define functions 乂 ： 
Up ^ R such that <p(P) = (x 1 (P), x 2 (P),..., These functions are 

called coordinate functions of 炉 . The numbers x l {P) are called coordinates of 
P, The neighborhood Up together with its mapping (p form a chart, denoted by 
(U Pi (p), 

Now let be another chart at P with coordinate functions fi(P )= 

(/(/>) ， y 2 (P) . y m (P)) (see Figure 26.1). It is assumed that the function 

从 o (p~ l : <p(Up n V p ) fji(Up C\ Vp), which maps a subset of E m to another 
subset of possesses derivatives of all orders. Then, we say that the two charts 



26.1 DIFFERENTIABLE MANIFOLDS 765 


Construction of an 
atlas for the sphere 
S 2 . 



Figure 26.1 Two charts (Up,(p) and (V>, 舛)， containing P are mapped into M m . The 
function jx o (p 〜 1 is an ordinary function from to JK W . 

M and <p are C 00 -related. Such a relation underlies the concept of smoothness in 
the definition of a manifold. A collection of charts that cover the manifold and of 
which each pair is e°°-related is called a e 00 atlas. 

26.13. Example* For the two-dimensional unit sphere S 2 we can construct an atlas as 
follows. Let P = (x.y.z) be a point in S 2 . Then x 2 y 2 z 1 = l,or 

Z = l—x 2 — y 2 . 

The plus sign corresponds to the upper hemisphere, and the minus sign to the lower hemi¬ 
sphere. Let be the upper hemisphere with the equator removed. Then a chart (U^, ^ 3 ) 

with ^3 : M 2 can be constructed by projecting on the xy-plane: <P 3 (P) = (jc, y). 

Similarly ， (U^, /X 3 ) with /X 3 : t/^" ^ M 2 given by — ( jc , 3 /) is a chart for the lower 
hemisphere. 

In manifold theory the neighborhoods on which mappings of charts are defined have 
no boundaries (thus the word “open”). This is because it is more convenient to define limits 
on boundaryless (open) neighborhoods. Thus, in the above two charts the equator, which 
is the boundary for both hemispheres, must be excluded. With this exclusion and 

cannot cover the entire 5 2 ; hence，they do not form an atlas. More charts are needed to 
cover the unit two-sphere. Two such charts are the right and left hemispheres uf and t/J*, 
for which 3 ^ > 0 and y < 0 t respectively. However, these two neighborhoods leave two 
points uncovered, the points (1 ， 0, 0) and (— 1 ， 0, 0). Again this is because boundaries of 
the right and left hemispheres must be excluded. Adding the front and back hemispheres 
U^ 1 to the collection covers these two points. Then S 2 is completely covered and we have 



766 26. ANALYSIS OF TENSORS 


illustration of 
stereographic 
projection 


an atlas. There is, of course, a lot of overlap among charts. We now show that these overlaps 
are 6 00 -related. 

As an illustration, we consider the overlap between and - This is the upper-right 

quarter of the sphere. Let (U^,(p 3 ) and (^,^ 2 ) be charts with 

炉 3 (H z) = (u )， y,z) = (x,z). 

The inverses are therefore given by 

y) = (x, y t z) = (x,y,-Jl-x 2 - y 2 ), 

(P 2 l (x,z) = (nz) = 0 ， a/i -x 2 - z 2 , z), 

and 

<P 2 o(p^ l {x, y) = <P 2 (x, y, y/l-x 2 -y 2 ) = (x,yjl-x 2 - y 2 ). 

Let us denote (p 2 o (p^ 1 by F, so that F : M 2 M 2 is described by two functions, the 
components of F: 

Fi(x, y)=x and F 2 O, y) = yl -x 2 - y 2 . 

The first component has derivatives of all orders at all points. The second component has 
derivatives of all orders at all points except atx 2 -\-y 2 = 1 , which is excluded from the 
region of overlap of and for which z can never be zero. Thus, F has derivatives of 
all orders at all points of its domain of definition. 

One can similarly show that all regions of overlap for all charts have this property, i.e” 
all charts are C 00 -related. ■ 

26.1.4. Example. For S 1 of the preceding example, we can find a new atlas in terms 

of new coordinate functions. Since x J + ^3 = 1, we can use spherical coordinates 

0 == cos 一 1 X 2 n<p = tan 一 HnAl). A chart is then given by (S 2 - {1} - 卜 1} ，从 )， where 
!x(P) — (0 y (p) maps a point of S 2 onto a region in E 2 . This is schematically shown in 
Figure 26.2. The singletons {1} and {-1} are the north and the south poles, respectively. 
This chart cannot cover all of S 2 , however, because when 9 — 0 (or tt), the value of the 
azimuthal angle (p is not determined. In other words, 0 = 0 (or 丌） determines one point 
of the sphere (the north pole or the south pole), but its image in M 2 is the whole range of 
(p values. Therefore, we must exclude 0 = 0 (or tt) from the chart /x). To cover these 
two points, we need more charts. _ 

26.1.5. Example. A third atlas for S 2 is the so-called stereographic projection shown 
in Figure 26.3. In such a mapping the image of a point is obtained by drawing a line from 
the north pole to that point and extending it, if necessary, until it intersects the xi^-plane. 
It can be verified that the mapping : 5 2 — {1} ^ M 2 is given by 

咖 ， ㈣ ) = (☆,☆). 

We see that this mapping fails for ^3 = 1, that is, the north pole. Therefore, the north 
pole must be excluded (thus, the domain S 2 - {1}). To cover the north pole we need 
another stereographic projection — this time from the south pole. Then the two mappings 
will cover all of 5 2 , and it can be shown that the two charts are C 00 -related (see Example 
26.1.12). _ 



26.1 DIFFERENTIABLE MANIFOLDS 767 


Figure 26.2 A chart mapping points of S 2 into E 2 . Note that the map is not defined for 
0 = 0,7T, and therefore at least one more chart is required to cover the whole sphere. 

The three foregoing examples illustrate the following fact, which can be shown 
to hold rigorously: 

26.1.6. Box. It is impossible to cover the whole S 2 with just one chart. 


vector spaces are 26.1.7. Example. Let V be an m-dimensional real vector space. Fix any basis {e；} in V 
manifolds with dual basis {€*}. Define 0 : V -> by 少 (v) = (eHv) ， ••• ， e m (v)). Then the reader 
may verify that (V, <j>) is an atlas. Linearity of 0 ensures that it has derivatives of all orders. 
This construction shows that V is a manifold of dimension m. ] ，國 

If M andN are manifolds of dimensions m and n ， respectively, we can construct 
product manifold their product manifold M x N，sl manifold of dimension m + n.A typical chart 
defined on M x JV is obtained from charts on M and N as follows. Let (£/, <p) be a chart 
on M and (V, ijl) one on N. Then a chart onM x N is(U xV,(p x /x) where 

ip x ix{P, Q) = (<p(P), eR m xR n = for P 6 U, Q e V. 

submanifold 26.1.8. Definition. Let M be a manifold. A subset N ofM is called a submanifold 
of M if N is a manifold in its own right. 

A trivial, but important, example of submanifolds is the so-called open 
open submanifolds submanifold. If M is a manifold and U is an open subset 1 of M ， then U in¬ 
herits a manifold structure from M by taking any chart (U a ， (p a ) and restricting <p a 
toU C\ U a . It is clear that dim (/ = dimM. Having gained familiarity with man¬ 
ifolds, it is now appropriate to consider maps between them that are compatible 
with their structure. 

1 Recall that an open subset U is one each of whose points is the center of an open ball lying entirely in U, 







768 26. ANALYSIS OF TENSORS 



Figure 26.3 Stereographic projection of 5 2 into /?' Note that the north pole has no 
image under this map; another chart is needed to cover the whole sphere. 


differentiable maps 
and their coordinate 
expressions 


function as a special 
kind of map 


26X9* Definition. LetM and N be manifolds of dimensions m andn ， respectively. 
Let f : M N be a map. We say that f is C 00 , or differentiable ， if for every 
chart (U, (p) in M and every chart (V ，以 in N ， the composite map fx o f o 屮 - 1 •• 
M m — R n , called the coordinate expression for /, is 6°° wherever it is defined? 

The content of this definition is illustrated in Figure 26.4. A particularly im¬ 
portant special case occurs when iV = R; then we call / a (real-valued) function. 
The collection of all Q 00 functions at a point P € M is denoted by If 

f e then / : R is C°° for some neighborhood Up of P. Let 

/ : ikf — AT be a differentiable map. Then / is automatically continuous. Now let 
y bean open subset of N. The set f~ l (V) is an open subset of M by Proposition 2 3 
16.4.6. 


26.1.10. Proposition. Let M be an m-dimensional manifold, f •, M — N a 

differentiable map，and V an open subset of N. Then the set of points of 

M mapped onto V ， is an open m-dimensional submanifold of M. 

Just as the concept of isomorphism identified all vector spaces, algebras, and 
groups that were equivalent to one another, it is desirable to introduce a notion that 
brings together those manifolds that “look alike.” 

26.1.11. Definition. A bijective differentiable map whose inverse is also dijferen- 
diffeomorphism and tiable is called a diffeomorphism. Two manifolds between which a diffeomorphism 
local diffeomorphism exists are called dijfeomorphic. Let M and N be manifolds, M is said to be diffeo- 

defined moTphic to N at P G M if there is a neighborhood U of P and a diffeomorphism 
f ：U ^ f(U). Then f is called a local diffeomorphism at P. 


2 Xhe domain of /z- o / o is not ail of ， but only its open subset <p{U). However, we shall continue to abuse the notation 
and write R m instead of (p{U). This way, we do not have to constantly change the domain as U changes. The domain is always 

clear from the context. , 

3 Although Proposition 16.4.6 was shown for nonned linear spaces, it really holds for all “spaces” for which the concept of 

open set is defined. 




26.1 DIFFERENTIABLE MANIFOLDS 769 


diffeomorphismsof a 
manifold form a 
gro 叩 

^-sphere and its 
stereographic 
projection 


In our discussion of groups, we saw that the set of linear isomorphisms of a 
vector space V onto itself forms a group GL(V). The set of diffeomorphisms of a 
manifold M onto itself also forms a group, which is denoted by Diff(M), 

26.1.12. Example. The generalization of a sphere is the unit n-sphere, which is a subset 
of defined by 

sn ― ((^1» …，％ +1) e l n+1 1 + ■'' 

The stereographic projection defines an atlas for S n as follows. For all points of S n except 

(0, 0, … ， 1)，the north pole, define the chart - {1} = U~^ ^W 1 by 

/ x\ x n \ , 

炉 +( 义 1 ’ - = {- ’ • • • ， j for (x \ ， • • •，^ U « 

VI- X n ^.l l - x n+ i / 

To include the north pole, consider a second chart (p— : S n — {—1} = U~ defined 

by 


(p— (j：j ,. ■ ■ ， = (■ 


工 1 


Xn 


for (xi ， ...,x n ^i) € U 


1 ’…’ 1 / 

Next, let us find the inverses of these maps. We find the inverse of 炉 +; that of can 
be found similarly. Let ^ = x^/(l — +i)_ Then one can readily show that 

ELi^ 2 + i 


I> 2 


and 


Xi 


1 +^+i 

1 ~ x n+i 


货 i 


^ x n-\-l 




for i = 1 ，2 ,…， /?• 


EL4 

From the definition of <p+, we have 

史十 （年 1， _ _ _ ， 心） = (x 1 ， . . . ， Xfj ， 知 +i) 

2fi 


( 26 . 1 ) 


( 


2$ n 


Hk=：l + 




ur …、- mm 

On the overlap of £/ + and i.e., on all points of S n except the north and the south 
poles ， <p- o (p~ l : can be calculated by noting that (p- has the following effect 

on a typical entry of Equation (26.1): 

馬 — 


X J 


i-> 






1 + A +1 — ELd + 1 — 

- 1 


Therefore, 


<P- ■••，&) = 




• _i 

It is clear that tp— o <p; L has derivatives of all orders except possibly at a point for which 
= 0 for all/. But this would correspond to x n+ \ = 1, which is excluded from the region 
of overlap. 圖 



770 26. ANALYSIS OF TENSORS 



Figure 26.4 Corresponding to every map f ，, M — N there exists a coordinate map 
fiofotp- 1 : -» R n . 

26.2 Curves and Tangent Vectors 

We noted above that functions are special cases of Definition 26.1.9. Another 
special case occurs when M =R. This is important enough to warrant a separate 
definition. 

differentiable curve 26.2.1. Definition. A differentiable curve in the manifold M is a C 00 map of an 

interval ofR to M. 

This definition should be familiar from calculus, where M = M? and a curve 
is given by its parametric equation (/i(f), or simply by r(0- The 

initial and final points point y(a) e Mis called the initial point, and y(b) e M is called the final point 
of a curve of the curve y. A curve is closed if y(a) = y(b). 

We are now ready to consider what a vector at a point is. All the familiar vectors 
in classical physics, such as displacement, velocity, momentum, and so forth, are 
based on the displacement vector. Let us see how we can generalize such a vector 
so that it is compatible with the concept of a manifold. 

In R 2 , we define the displacement vector from P to <2 as a directed straight 
line that starts at P and ends at Q. Furthermore, the direction of the vector remains 
the same if we connect P to any other final point on the line PQ located beyond 
Q t This is because R 2 is a flat space, a straight line is well-defined, and there is 
no ambiguity in the direction of the vector from P to Q. Things change, however, 
if we move to 过 two-dimensional spherical surface such as the globe. How do 
we define the straight line from New York to Beijing? There is no satisfactory 
definition of the word “straight” on a curved surface. Let us say that “straight” 
means shortest distance. Then our shortest path would lie on a great circle passing 
through New York and Beijing. Define the “direction” of the trip as the “straight” 





26,2 CURVES AND TANGENT VECTORS 771 


arrow, say 1 km in length, connecting our present position to the next point 1 km 
away. As we move from New York to Beijing, going westward, the tip of the arrow 
keeps changing direction. Its direction in New York is slightly different from its 
direction in Chicago. In San Francisco the direction is changed even more, and 
by the time we reach Beijing, the tip of the arrow will be almost opposite to its 
original direction. 

The reason for such a changing arrow is, of course, the curvature of the mani¬ 
fold. We can minimize this curvature effect if we do not go too far from New York. 
If we stay close to New York, the surface of the earth appears flat, and we can draw 
arrows between points. The closer the two points, the better the approximation 
to flatness. Clearly, the concept of a vector is a local concept, and the process of 
constructing a vector is a limiting process. 

The limiting process in the globe example entailed the notions of “closeness.” 
Such a notion requires the concept of distance, which is natural for a globe but 
not necessary for a general manifold. For most manifolds it is possible to define 
a metric that gives the “distance” between two points of the manifold. However, 
the concept of a vector is too general to require such an elaborate structure as a 
metric. The abstract usefulness of a metric is a result of its real-valuedness: given 
two points Pi and ft, the distance between them, d{P\ y 尸 2 )， is anonnegative real 
number. Thus, distances between different points can be compared. 

We have already defined two concepts for manifolds (more basic than the 
concept of a metric) that together can replace the concept of a metric in defining 
a vector as a limit. These are the concepts of (real-valued) functions and curves. 
Let us see how functions and curves can replace metrics. 

Let: [a ， 办 ] —M be a curve in the manifold Af. Let P € M be a point 
of M that lies on y such that y{c) = P for some c G [a, b\ Let / 6 F°°(P). 
Restrict / to the neighboring points of P that lie on y . Then the composite function 
/oy ： R-™>]Risa real-valued function on E. 

We can compare values of / o y for various real numbers close to c — as in 
calculus. If w € [a, b] denotes 4 the variable, then / o y{u) = f(y(u)) gives the 
value of / o / at various w ， s. In particular, the difference A(/o y) = — 

f(y(c)) is a measure of how close the point y(u) G Af is to P. Going one step 
further, we define 


d(f o y) 


du u=c 


]im f(v(u))-f(r(c)) 

M — C 


(26.2) 


the usual derivative of an ordinary function of one variable. However, this derivative 
depends on y and on the point P. The function / is merely a test function. We 
could choose any other function to test how things change with movement along 
y. What is important is not which function we choose, but how the curve y causes 
it to change with movement along y away from 尸 . This change is determined by 


4 We usually use m or f to denote the (real) argument of the map y : [a, b] M. 


772 26. ANALYSIS OF TENSORS 


illustration of the 
equality of vectors 
and directional 
derivatives 


tangent vector 
defined 


derivation property of 
tangent vectors 


the directional derivative along y at 尸 ， as given by (26.2). A directional derivative 
determines a tangent which, in turn, suggests a tangent vector. That is why the 
tangent vector at P along y is defined to be the directional derivative itself! 

The use of derivative as tangent vector may appear strange to the novice, 
especially physicists encountering it for the first time, but it has been familiar to 
mathematicians for a long time. It is hard for the beginner to imagine vectors 
being charged with the responsibility of measuring the rate of change of functions. 
It takes some mental adjustment to get used to this idea. The following simple 
illustration may help with establishing the vector-derivative connection. 

26,2*2* Example. Let us take the familiar case of a plane and consider the vector a = 
a x e x + a y ^y. What kind of a directional derivative can correspond to a? First we need a 
curve y :R ^ W 2 that is somehow associated with a. It is not hard to convince oneself 
that the most natural association is that of vectors to tangents. Thus, we seek a curve whose 
tangent is (parallel to) a. The easiest (but not the only) way is simply to take the straight line 
along a; that is, let y(u) — (a x u 9 a y u). The directional derivative at / = 0 for an arbitrary 
fimction / : M 2 -> E is given by 


d(f o y) 


du 


lim 

«—0 


/(y ⑻）- firm 


u 


lim 

K ->0 


/(%m ，¥) — /(0,0) 


u 


(26.3) 


Taylor expansion in two dimensions yields 


f{a x u,a y u) = /(0,0) -\-a x u 
Substituting in (26.3)，we obtain 


V 

Sx 


^a y uf 

m=0 oy 


+ 


M=0 


d(foy) 


du 


=lim 
m =0 m ->-0 


a x u(df/dx) u= o^a y u(df/dy) u= o + 


u 


a/ 1 a/ _ / a a 、 


ax 3^ +ay 3y 


This clearly shows the connection between directional derivatives and vectors. In fact，the 
correspondences d/dx e x and 3/dy o t y establish this connection very naturally. 

Note that the curve y chosen above is by no means unique. In fact, there are infinitely 
many curves that have the same tangent at f = 0 and give the same directional deriva¬ 
tive. 豳 


Since vectors are the same as derivatives, we expect them to have the properties 
shared by derivatives: 

26.2.3. Definition. Let M be a differentiable manifold. A tangent vector at P € M 
is an operator t : F°°(P) M such that for every f,ge F°°(P) and a, eR 

L tis linear: t(af + fig) — at(/) + 

2. t satisfies the derivation property: 

t(/g) = g(P)Kf) + /(P)t(g). 


26.2 CURVES AND TANGENT VECTORS 773 


The operator t is an abstraction of the derivative operator. Note that t (/)， g(P), 
/(P), and t(^) are all real numbers. 

The reader may easily check that if addition and scalar multiplication of tangent 
tangent space vectors are defined in an obvious way, the set of all tangent vectors at P e M 
defined becomes a vector space, called the tangent space at P and denoted by (M). If 
t/ is an open subset of M (therefore, an open submanifold of M), then it is clear 
that 


7 P (JJ) = 7 P {M) for ail P €U. (26.4) 

Definition 26.2.3 was motivated by Equations (26.2) and (26.3). Let us go back¬ 
wards and see if (26.2) is indeed a tangent, that is, if it satisfies the two conditions 
of Definition 26.2.3. 

vectors tangent to a 26.2.4. Proposition. Let y be a S 00 curve in M such that y(c) = P. Define 7 (c) 
curve fjy 1 


(7W) (/) = y-/oy 

uU u=c 

for every f 6 F°° (P). Then 7 (c) is a tangent vector at P called the vector tangent 
to y at c. 


Proof. We have to show that the two conditions of Definition 26.2.3 are satisfied 
for f, g g F°°(P) and a, p e M. The first condition is trivial. For the second 
condition, we use the product rule for ordinary differentiation as follows: 




du 

I ^-(/oy) 
L da 


u=c 


d 

du 


[(/oy)feoy)] 


u=c 


(g° y)u=c + (/o y^=c[^(s°y) 


r d 


M=C 」 


= [( 今⑻ )(/)]#(y(c)) + /(y(c))[( 7 (c))(g)] 


=E(7(c))(/)k(P) + f(Pm(c))(g)l 


Note that in going from the first equality to the second, we used the fact that by 
definition, the product of two functions evaluated at a point is the product of the 
values of the two functions at that point. □ 


Let us now consider a special curve and corresponding tangent vector that is 
of extreme importance in applications. Let (p = (x 1 , x 2 ,..., x m ) be a coordinate 
system at P, where / : M Misthe ith. coordinate function. Then 沪 is a bijective 
C°° mapping from the manifold M intoR wl . Its inverse, 史一 1 : 4 M, is also a 

S°° mapping. Now, the ith coordinate of P is the real number u = x l (P), Suppose 
that all coordinates of P are held fixed except the ?th one, which is allowed to vary 
with u describing this variation. 


774 26. ANALYSIS OF TENSORS 


coordinate curve, 
coordinate vector 
field, and coordinate 
frames 


26.2.5, Definition. Let {Up, (p) be a chart at P e M, Then the curve y l : M ^ M, 
defined by 

y i (u) = <p-\x 1 (P) . x w (P) ， …, x m (P)) 


is called the ith coordinate curve through P. The tangent vector to this curve 
at P is denoted by 尸 and is called the ith coordinate vector field at P. The 
collection of all vector fields at P is called a coordinate frame at P. The variable 
u is arbitrary in the sense that it can be replaced by any (good) function of m. 


Let c = x ( (P). Then for / G F°°(-P), we have 

(aH/0/ = (?f ⑻) (/)= 4/。〆 


= i 抑、 1 ⑻ ，…， 


3x l p 




9/Ip = 


dx l Ip 


(26.5) 


where the last equality is a (natural) definition of the partial derivative of / with 
respect to the ith coordinate evaluated at the point P. This partial derivative is 
again a 6 00 function at P. We therefore have the following: 

26.2.6. Proposition* The coordinate frame {Bi\p}f^i at P is a set of operators 
di (P) : F°°(P) R given by 

i?i\p)f = (26.6) 

ax p 

三 f /( 炉 - WV )， …， x ^\ P \ u y ?’ +1 (尸)，…，，(尸 ))）. 

au k=c 


Another common notation for df/dx l is 

26.2«7* Example. Pick a point P = (sin 0 cos 炉 ， sin Osinq), cosO) on the sphere 5 2 ina 
chart (Up,fi) given by 从 (sin0 cos 史， sin0 sin 炉， cos 9) = (0, <p). If 9 is kept constant and 
(p is allowed to vary over values given by m, then the coordinate curve associated with 炉 is 
given by 

Ytpiu) = u) = (sin0 cos m, sin^ sin m, cos 0). 

As u varies, ^(m) describes a curve on 5 2 . This curve is simply a circle of radius sin 沒 . 
The tangent to this curve at any point is 3/9 炉 ， or simply d ^，the derivative with respect to 
the coordinate cp. 

Similarly, the curve 押 (k) describes a great circle on 5 2 with tangent de e d/d&. 圜 


The vector space Tp(M) of all tangents at P was mentioned earlier. In the 
case of S 2 this tangent space is simply a plane tangent to the sphere at a point. 
Also, the two vectors, de and 〜 encountered in Example 26.2.7 are clearly linearly 
independent. Thus, they forma basis for the tangent plane. This argument can be 
generalized to any manifold. The following theorem is such a generalization (for 
a proof, see [Bish BO, pp. 51-53]): 


26.2 CURVES AND TANGENT VECTORS 775 


26.2.8. Theorem. Let M be an m-dimensional manifold and P e M. Then the 
set {3( Ip}^ forms a basis ofJp (M). In particular, Tp (M) is m-dimensional. An 
Remember Einstein’s arbitrary vector, t € Tp (M), can be written as 
summation 

convention! t = |p, where a 1 = 

The last statement can be derived by letting both sides operate on and using 
Equation (26.6). Let M = V, a vector space. Choose a basis {ej} in V with its 
dual considered as coordinate functions. Then, at every \ eV 9 there is a natural 
isomorphism^ : V (V) mapping a vector u = cx^ei g Vontoa 1 汰 | v e OVCV). 

The reader may verify that this isomorphism is coordinate independent; i.e” if one 
chooses any other basis of V with its corresponding dual, then ^(v) will be the 
same vector as before, expressed in the new coordinate basis. Thus, 


26.2.9. Box. IfVisa vector space, then for all v € one can identify 7 Y (V) 

with V itself. 


Suppose we have two coordinate systems at P, {jc*} with tangents 9/|p and 
{y} with tangents Vj\p. Any te^p (M) can be expressed either in terms of di\p 
or in terms of Vj|p ： t = a 1 汍 | 尸 =Vy |jo. We can use this relation to obtain a 1 
in terms of : From Theorem 26.2.8, we have 




dy^ 


P J 


(x l ) 


: dx l 


dyJ 


(26.7) 


p 


In particular, if t = V 々 | 尸， then ^ = t(〆 ） =[VfelpKy)) = #， and (26.7) gives 
a 1 = Sx l /dy k . Thus, using Equation (26.5 )， 




dx l 9 
P dyJ dx l 


(26.8) 


p 


For any function / G F°°(P) 9 Equation (26.8) yields 


d 

■ 



3x l 


r d 

■ 

„ dx l 

a/ 

.dy^ 


1 dyj 

P 

dy J 

p 

- dx ( 

P- 

/= ay 

p dx ( 


This is the chain rule for differentiation. 


26.2.10. Example. Let us find the coordinate curves and the coordinate frame at P = 
(x, y, z) on We use the coordinates of Example 26.1.3. In particular, consider 炉 3 , whose 
inverse is given by 


(P^ l (x 9 y) = (x, y, yj\ -x 2 -y 2 ). 



776 26. ANALYSIS OF TENSORS 


The coordinate curve along y is obtained by letting y be a function 5 of u: 

Y 2 (u) = h(u)) = (x f h(u), yjl-x 2 - h 2 (u)), 

where h(0) = y and h r (0) = a, a constant. To find the coordinate vector field at P y let 
/ e F°°(P), and note that 

9 z/= i/_))L = 。= T u f{x ' m ' A- xl - hHu)) Lo 

士 心) L 。 

==a (V_zV\ =a /i.„zl) / . 

\By z dz) Va^ z dz/ J 
So, choosing the function h in such a way that a = 1, 

^2 = - -d z , 

where 3^ and d z are the coordinate vector fields of E' The coordinate vector field 9i can 
be obtained similarly. ■ 

26.3 Differential of a Map 

Now that we have constructed tangent spaces and defined bases for them, we 
are ready to consider the notion of the differential (derivative) of a map between 
manifolds. 

26.3.1. Definition. LetM and N be manifolds of dimensions m andn, respectively, 
and let xjr : M N be a C 00 map. Let P e M f and let Q = 少 ( 尸 ） € N be the 
differential of a m 叩 image of P. Then there is induced a map : 7p{M) -> 了 q(N) ，called the 
at a point differential of ^ at P and given as follows. Let t € 7p{M) and f e F°°(Q). The 
action € 7q(N) on f is defined as 

(26.9) 

The reader may check that the differential of a composite map is the composite 
of the corresponding differentials, i.e., 

( 少。 <p)^p = o (26.10) 

Furthermore, if 诊 is a local diffeomorphism at P, then is a vector space 
isomorphism. The inverse of this statement~which is called the inverse mapping 
theorem, and is much harder to prove (see [Abra 88 , pp. 116 and 196])~is also 
true: 


V dh + V ' 
dy du u=0 dz k 


26.3 DIFFERENTIAL OF A MAP 777 


inverse mapping 
theorem 


Jacobian matrix of of 
a differentiable map 


Differential a 
constant map is the 
zero map. 


26.3.2. Theorem, (inverse mapping theorem) If : M N is a map and 
•• 7p(M) 7f(p)(N) is a vector space isomorphism, then 中 is a local 

diffeomorphism at P. 


Let us see how Equation (26.9) looks in terms of coordinate functions. Suppose 
that are coordinates at P and {：y a }2 =1 are coordinates at Q = ^(P). We 

note that o i/r is a real-valued Q°° function on M. Thus, we may write (with the 
function expressed in terms of coordinates) 


We also have t =： Similarly, ^ a (S/dy a )\Q because {dldy a )\o\ 

forma basis. Theorem 26.2.8 and Definition 26.3.1 now give 

P a = = t(y* o t/t) = t ( 尸） 

Ta^ . 

dx l p 

i=l 

This can be written in matrix form as 


[ct i d i \p](f a )=a 


i ar 


dx l \p 


/^\ 


(df l /dx l df l /dx 2 ... 3f l /dx m \ 


/V 、 

P 2 

# 

= 

a/ 2 / 如 1 df 2 fdx 2 … df 2 /dx m 

• m » 


Of 2 

• 



• m ■ 

• « i 

\3f n /Sx l df n /Bx 2 ... df n /dx m ) 


• 


(26.11) 


The n xm matrix is denoted by J and is called the Jacobian matrix of ^ with 
respect to the coordinates x l and y a . On numerous occasions the two manifolds 
are simply Cartesian spaces, so that 诊： M. n . In such a case, f a is naturally 
written as and the Jacobian matrix will have elements of the form d^/dxK 

An important special case of the differential of a map is that of a constant 
map. Let f \ M ^ {Q} £ Nbo such a map; it maps all points of M onto a single 
point Q of N, For any / e F°°(Q), the function f ot/r e F°°(P) is constant for 
all P e M. Let t e Tp (M) be an arbitrary vector. Then 

(V^W)(/)Et(/o 少 ）=0 V/4^Mt)=0Vt (26.12) 


because t(c) = 0 for any constant c. So, 


26*3.3. Box. Ifxjr : M {Q} € N is a constant map, so that it maps the 
entire manifold M onto a point Q of N, then : 7p{M) 了 q(N) is 
the zero map. 


5 See the last statement of Definition 26.2.5. 




778 26. ANALYSIS OF TENSORS 


components of 
tangents to curves 


differential of a 
real-vaiued function 


Two other special cases merit closer attention: M = E for arbitrary and 
N = R for arbitrary M. In either case T C (E) is one-dimensional with the basis 
vector {d/du)\ c . When M = M, the mapping becomes a curve, y :R ^ N. The 
only vector whose image we are interested in is t = (d/du)\ c , with y(c) = P. 
From (26.9) using Proposition 26.2.4 in the last step, we have 


y*c 


d_ 

du 




d_ 

du 


foy 


U=C 


= (7(c))(/). 


This tells us that the differential of a curve at c is simply its tangent vector at y(c). 
It is common to leave out the constant vector (d/du)\ Ci and write y^ c for the LHS. 


26.3«4. Example. It is useful to have an expression for the components of the tangent to 
a curve y at an arbitrary point on it. Since)/ maps the real line to M, with a coordinate patch 
established on M，we can write y as y = (y 1 ，• - ■ ， y m ) where y l — x l o y are ordinary 
functions of one variable. Proposition 26.2.4 then yields 


Y*tf 


d f 

~rf°y 

du 

9/ dy[ 
dx l du 


u=t 


du 

d^dy[ 
dx l dt 


U=t 




• ， y w («)) 

u- 




or 



For this reason, is sometimes denoted by y, 


(26.13) 


When iV = R, we are dealing with a real-valued function / : M — R. The 
differential of / at P is f^p : 7p(M) T C (R), where c = /( 尸 ) • Since T C (R) 
is one-dimensional, for a tangent t € 7p(M), we have = a(d/du)\ c . Let 

g : M — R be an arbitrary function on M. Then [/*p(t)] ( 发 ） = a(dg/du) c , or, by 
definition of the LHS, t (/ 。及 ） = a(dg/du) c . To find a we choose the function 
g(w) = w, i.e., the identity function^then dg/du = 1 andt (/ og) = t(/) = a. We 
thus obtain /*/>(t) = Since T C (E) is a flat one-dimensional vector 

space, all vectors are the same'and there is no need to write (d/du)\ c . Thus, we 
define the differential of /, denoted by df = /^, as a map df : 7p(M) M 
given by 

df(t) = t(/). (26.14) 

■ 

In particular, if / is the coordinate function x l and t is the tangent to the jth 
coordinate curve 8 7 *|p, we obtain 


dx i \p(dj\p) = [dj\p](x i ) 


dx l 


si. 


p 


(26.15) 



26.3 DIFFERENTIAL OF A MAP 779 


This shows that 



26.3.6. Example. Let / : M R be a real-valued function on M. Let x* be coordinates 
at P. We want to express df in terms of coordinate functions. Fort e 7p (M) we can write 
Remember Einstein’s t = a*3/|p and 

summation . 

convention! df (t) = t(/) = a 1 [3/|p](/) == W /)， 

where in the last step, we suppressed the P, Theorem 26.2.8 and Equation (26.14) yield 
a} = t(x l ) = (Jx*)(t). We thus have 

dm = diimdx^t)} = (/)(&‘)]»• 

Since this is true for all t, we get 

d 卜 SiifXdx 1 ) E X>(/)( 以） = Y ， 71 dxi - (2616) 

1=1 1 = 1 dx 

This is the classical formula for the differential of a function f. If we choose , the yth 
member of a new coordinate system，for /， we obtain 

dyj =T^idx i ^^idx i 7 (26.17) 

fr[ dx 1 dx 1 

which is the transformation dual to Equation (26.8). 豳 

The following is a powerful theorem that constructs a submanifold out of a 
differentiable map (fora proof, see [Warn 83, p. 31]): 

26J.7. Theorem. Assume that 伞： M N isaQ 00 map, that Q is a point in the 
range of f ， and that •• 7p(M) 7 q(JST) is surjective for all P G ir~ x (Q). 
Then ir~ l (Q) is a submanifold of M and dim = dimM — dim 

Compare this theorem with Proposition 26.1.10. There, V was an open subset 
of N, and since / _1 (V) is open, it is automatically an open submanifold. The 
difficulty in proving Theorem 26.3.7 lies in the fact that ir~ l (Q) is closed because 
{Q}, a single point of N 9 is closed. 

We can justify the last statement of the theorem as follows. From Equation 
(26.12), we readily conclude that = ker The dimension theo¬ 

rem, applied to ♦ 本 p : Tp(M) 7q(N), now gives 

dim 7 P (M) = dimker f^p + rank^/> dimM = dimi/r - H q)+ dimiV ， 

where the last equality follows from the surjectivity of 少 * 尸 . 



780 26. ANALYSIS OF TENSORS 


tangent bundle 
defined 


vector field defined 


vector fields related 
by a map 


26.3.8. Example. Consider a e 00 map / : M n ->■ E. Let c e M such that the partial 
derivatives of / are defined and not all zero for all points of / 一 1 (c) ■ Then, according to Equa¬ 
tion (26.11), a vector € T/>(M ,Z ) is mapped by /* to the vector a} (df/dx^) f— c djdt. 
• * « « 

Since 3f/dx l are not all zero, by properly choosing a 1 , we can make a* (df/dx l ) f =c d/dt 

sweep over all real numbers. Therefore, /* is surjective, and by Theorem 26.3.7, / _1 (c) 
is an (« — l)-dimensional submanifold of A noteworthy special case is the function 
defined by 

/(JC 1 ， X 2 , … ，， )= (x 1 ) 2 + (x 2 ) 2 + … + (x n ) 2 
and c ~ r 2 > 0, Then,/ -1 (c),an(w — 1)-sphere of radius r ， is a submanifold of JR' El 

26.4 Tensor Fields on Manifolds 

So far we have studied vector spaces, learned how to construct tensors out of 
vectors, touched on manifolds (the abstraction of spaces), seen how to construct 
vectors at a single point in a manifold by the use of the tangent-at-a-curve idea, 
and even found the dual vectors dx l \p to the coordinate vectors S(\p at a point P 
of a manifold. We have everything we need to study the analysis of tensors. 

26,4.1 Vector Fields 

We are familiar with the concept of a vector field in 3D: Electric field, magnetic 
field, gravitational field, velocity field, and so forth are all familiar notions. We 
now want to generalize the concept so that it is applicable to a general manifold. 
To begin with, let us consider the following definition. 

26.4«1. Definition. The union of all tangent spaces at different points of a manifold 
M is denoted by T (M) and called the tangent bundle ofM: 

T(M) = M 7 P (M) 

PeM • 

It can be shown ([Bish 80, pp ， 158-164]) that T(M) is a manifold of dimension 
2 dim M. 

26.4.2. Definition. A vector field Xona subset U of a manifold M is a mapping 
X : U T(M) such that X(P) = X|p ^ Xp € Tp(M). The set of vector 
fields on M is denoted by X(M). Let M and N be manifolds and F : M ^ N a 
differentiable map. We say that the two vector fields X € X(Af) and Y e X(N) 
are F-related if F^(Xp) = for all P e M. This is sometimes written simply 
as F*X = Y. 

It is worthwhile to point out that F*X is not, in general, a vector field on N. To 
be a vector field ， F*X must be defined at all points of N. The natural way to define 
at Q e iV is [F*X(<2)](/) = X( / o F) where P is the preimage of Q, i.e., 




26.4 TENSOR FIELDS ON MANIFOLDS 781 


e°° vector fields 


F(P) = Q. But there may not exist any such P (F may not be onto), or there may 
be more than one P (F may not be one-to-one) with such property. Therefore, this 
natural construction does not lead to a vector field on N. If F*X happens to be a 
vector field on iV, then it is clearly 尸 -related to X. In terms of the coordinates x l , 
at each point P e M, 

Xp = X|p = X l P di\p, 

where the real numbers X l p are components of Xp in the basis {汍 | 尸 }. As P moves 
around in 77 5 the real numbers X l p keep changing. Thus, we can think of X l p as a 
function of P and define the real-valued function X 1 : M —>■ M by X l (P) = X l p . 
Therefore, the components of a vector field are real-valued functions on M. 

26.4.3. Example. Let M = M 3 . At each point P = (x, y, z) e M 3 , let (e^；, e y , e z ) be a 
basis for M 3 . Let Vp be the vector space at P. Then T (M 3 ) is the collection of all vector 
spaces Vp for all P. 

We can determine the value of an electric field at a point in by first specifying 
the point, as Pq = ( 又 0, 70, ^o)* f° r example. This uniquely determines the tangent space 
7p 0 (E 3 ). Once we have the vector space, we can ask what the components of the electric 
field are in that space. These components are given by three numbers: 五无 ( 又 0 , 凡，功)， 
£ 3 ,(^ 0 , yo, zo), and E z (xq, yo, zo)_ The argument is the same for any other vector field. 

To specify a “point” in T(M 3 ), we need three numbers to determine the location in 
M 3 and another three numbers to determine the components of a vector field at that point. 
Thus, a “point” in : T(M 3 ) is given by six “coordinates” (x, E x , E y ， E z ) t md r(M 3 ) is 

a six-dimensional manifold. ^ 

We know how a tangent vector t at a point P e M acts on a function / e 
F°°(P) to give a real number t(/). We can extend this, point by point, for a vector 
field X and define a function X(/) by 

EX(/)](P) = X P (f\ PeU, (26,18) 

where ?7 is a subset of M on which both X and / are defined. The RHS is well- 
defined because we know how Xp ，the vector at P, acts on functions at P to give 
the real number [Xp](/). On the LHS, we have X(/), which maps the point P 
onto a real number. Thus, X(/) is indeed a real-valued function on M. We can 
therefore define vector fields directly as operators on 6°° functions satisfying 

X(a/ + 制二 aX(f) Hh 办 X ⑻， 

X ⑽ =[X(/)]g + 陶 ] /. 

A prototypical vector field is the coordinate vector field 决 .In general, X(/) 
is not a C 00 function even if / is. A vector field that produces a 6 00 function 
X(/) for every C°° function / is called a C°° vector field. Such a vector field has 
components that are C 00 functions on M. 

The set of tangent vectors 7p(M) at a point P e M form an m-dimensional 
vector space. The set of vector fields X(M ) ― which yield a vector at every point 


782 26, ANALYSIS OFTENSORS 


The set of vector 
fields form a Lie 
algebra. 


Jacobi identity 


of the manifold — also constitutes a vector space. However, this vector space is 
(uncountably) infinite-dimensional. A property of X(Af) that is absent in 7p{M) 
is composition . 6 This suggests the possibility of defining a “product” on X(M) to 
turn it into an algebra. Let X and Y be vector fields. For X o Y to be a vector field, 
it has to satisfy the derivation property. But 

Xo Y(/ 及 ） =X(Y (/ >)) = X(Y(/)g + fY(g)) 

=(X(Y(f)))g + Y(f)X(g) + X(f)Y(g) + /(X(Y ⑻ )） 

. ^(XoY(f))g + f(XoY(g)). 

However, the reader may verify thatXoY—Y oX does indeed satisfy the derivation 
property. Therefore, by defining the binary operation X(M) x X(M) X(M) as 

[X,Y] =XoY-YoX, 

X(M) becomes an algebra, called the Lie algebra of vector fields of M. The 
binary operation is called the Lie bracket. Although it was not mentioned at the 
time, we have encountered another example of a Lie algebra in Chapter 2, namely 
£»(V) under the binary operation of the commutation relation. Lie brackets have 
the following two properties: 

[X ， Y] = —[Y ， X ]， 

[[X ， Y] ， Z] + [[Z ， X] ， Y] + [[¥ ， Z] ， X] = 0. 

These two relations are the defining properties of all Lie algebras. The last re¬ 
lation is called the Jacobi identity. X(Af) with Lie brackets is an example of 
an infinite-dimensional Lie algebra; ((V) with commutators is an example of a 
finite-dimensional Lie algebra. 

We shall have occasion to use the following theorem in our treatment of Lie 
groups and algebras in the next chapter: 

26.4.4. Theorem. Let M and N be manifolds and F : M N a differentiable 
map. Assume that、e X(M) is F-related to Y/ e X(N) for i = 1 ， 2. Then 
[Xi, X 2 ] is F-relatedto [Yi, Y 2 ], Le. y 

^[Xi,X 2 ] = t^Xi,F,X 2 ]. 

Proof. Let / be an arbitrary function on N. Then 

(F*tXi,X 2 ])/^ [Xi,X 2 ](/oF) = Xi (X 2 (/o F)) - X 2 (X x (foF)) 
=Xi ([F*X 2 (/)] oF)-X 2 a^Xi(/)] o F) 

- ^Xi (F*X 2 (/)) — F*X 2 (F*Xi(/)) 

=[^Xi, f*x 2 ]/， 


6 Recall that a typical element of Jp (M) isa map t: F°°(P) R for which composition is meaningless. 




26.4 TENSOR FIELDS ON MANIFOLDS 783 


integral curve of a 
vector field 


where we used Equation (26.9) in the first, second, and third lines, and the result 
of Problem 26.8 in the second line. □ 

It is convenient to visualize vector fields as streamlines. In fact, most of the 
terminology used in three-dimensional vector analysis, such as flux, divergence, 
and curl, have their origins in the flow of fluids and the associated velocity vector 
fields. The streamlines are obtained_in nonturbulent flow — by starting at one 
point and drawing a curve whose tangent at all points is the velocity vector field. 
For a smooth flow this curve is unique. There is an exact analogy in manifold 
theory. 

26.4.5. Definition. Let X e X(Af) be defined on an open subset U of M• An 
integral curve of X is a curve y whose range lies in U and for every t in the 
domain of y ， the vector tangent to y satisfies y^ t = X(y(/))- /fy(0) = P t we say 
that y starts at P • 

Let us choose a coordinate system on M. Then X = X^i, where X 1 are C°° 
functions on M, and, by 26.13, y* = The equation for the integral curve of 
X will therefore become 

Y^i =X i (y(t))di, or -j- — X 1 , y m (t)) , i = l ， 2”" ， m. 

at 

Since y l are simply coordinates of points on M ， we rewrite the equation above as 


dx l 

dt 


= X 1 (x l (t) . 


f = 1, 2,..., m. 


(26.19) 


This is a system of fiist-order differential equations that has a unique (local) so¬ 
lution once the initial value y(0) of the curve, i.e” the coordinates of the starting 
point P, is given. The precise statement for existence and uniqueness of integral 
curves is contained in the following theorem. 

26.4.6. Theorem, Let XbeaG 00 vector field defined on an open subset U ofM. 
Suppose P € U, and c e R. Then there is a positive number € and a unique 
integral curve y ofX defined on — c\ < € such thatyic) — P. 

26.4.7. Example, examples of integral curves 

(a) Let M = M with coordinate function x. The vector field X = xd x has an integral curve 
with initial point jcq given by the DE dx/dt = x(t) 9 which has the solution — e^Q. 

(b) Let M = R n with coordinate functions jc 1 . The vector field X= ^a l S( has an integral 

curve, with initial point ro, given by the system ofDEs dx i jdt = a\ which has the solution 
x ( (t) = or r = af + ro. The curve is therefore a straight line parallel to a going 

through ro. 

(c) Let M = M. n with coordinate fonctiom Consider the vector field 

n 

X= Z) 


784 26. ANALYSIS OF TENSORS 


flow of a vector field 

one-parameter group 
of transformations 


Global 1-parameter 
group of 
transformations; 
complete vector 
fields 


The integral curve of this vector field, with initial point ro, is given by the system of DEs 
dx l jdt = which can be written in vector form as dr/dt = Ar where A is a 

constant matrix. By differentiating this equation several times, one can convince oneself 
that d k T/dt k = A^r. The Taylor expansion of r(f) then yields 

t k = 

f=0 

(d) Let M = M 2 with coordinate x, y. The reader may verify that the vector field X = 
—yd x + has an integral curve through ( 又 0 , yo) given by 

x = 邱 cos t — yo sin t, 
y = 和 sin f cos f ， 

i,e.，a circle centered at the origin passing through ^o)* 圜 

Going back to the velocity vector field analogy, we can think of integral curves 
as the path of particles flowing with the fluid. If we think of the entire fluid as a 
manifold M, the flow of particles can be thought of as a transformation of M. To 
be precise, let M be an arbitrary manifold, and X e X(M). At each point P of 
M, there is a unique local integral curve yp of X starting at P defined on an open 
subset U of M. The map F t •• U — M defined by F t (P) = yp(t) is a (local) 
transformation of M. The collection of such maps with different f’s is called the 
flow of the vector field X. The uniqueness of the integral curve yp implies that F t 
is a local diffeomorphism. In fact, the collection of maps forms a (local) 

one-parameter group of transformations in the sense that 

F t o F s = F f4 . 5 , Fq — id, = F-t- (26.20) 

One has to keep in mind that Ft at a point P e M is, in general, defined only 
locally in t, i.e” only for t in some open interval that depends on P, For some 
special, but important, cases this interval can be taken to be the entire R for all P, 
in which case we speak of a global one-parameter group of transformations, 
and X is called a complete vector field on M, 

The symbol F t used for the flow of the vector field X does not contain its 
connection to X. In order to make this connection, it is common to define 

Ft = exp(/X). (26.21) 

This definition, with no significance attached to “exp” at this point, converts Equa¬ 
tion (26.20) into 

exp(fX) o exp(^X) = exp[0 + s)X], 

exp(OX) = id, (26.22) 

[exp(^X)]- 1 = exp ㈠ X )， 


r ⑴ 


oo ! 

El 


d k i 


k=0 


k\ dt k 


oo t k 

-^A fc r 0 = ^ A r 0 . 


*=o 



26 4 TPMgQRXlELDS ON MANIFOLDS 785 


cotangent bundle of a 
manifold 


which notationallyjustifiestheuseof “exp,” We shall see in our discussion of Lie 
groups that this choice of iiotation is not accidental. 

Using this notation, we can write 

x 眷》， d 。 一 尤 . 

One usually leaves out the function f and writes 


v d 

= — exp(/X) 


(26.23) 


where it is understood that the LHS acts on some f that must compose on the RHS 
to the left of the exponential. Similarly, we have 

d 


(F^X) F{P) = -F(exp/X) 
G*f(p) (去 f (expfX)| J 。 F(expfX) 


f=0 


(26.24) 


=F # (X) 


where F : M ^ N mdG : N K 3xc maps between manifolds. 

26.4.8. Example. In this example, we derive a useful formula that gives the value of a 
function at a neighboring point of P e M located on the integral curve of X e X(M) going 
through We first note that since Xp is tangent to yp at P = y (0), by Proposition 26.2.4 
we have 

X 秦！ /(_)| f = o =! 膊 （ 。 . 

Next we use the definition of derivative and the fact that Fq(P) = P to write 

lim 7 [/ (F t (P)) - f(P)] - X P (/). 

Now, if we assume thatr is very small, we have 

f (F t (P)) = /(P) + tXp(f) + …， (26.25) 

which is a Taylor series with only the first two terms kept. 國 


26A2 Tensor Fields 

We have defined vector spaces 7p (Af)at each point of M. We have also constructed 
coordinate bases, for these vector spaces. At the end of Section 26.2, 

we showed that the differentials {dx l [pJ^j forma basis that is dual to {3 t * I p}^. 
Let us concentrate on this dual space, which we will denote by Tp (M). 

Taking the union of all 7p(M) at all points of M, we obtain the cotangent 



786 26. ANALYSIS OF TENSORS 


differential one-form 


pullback of a 
differentiable map 


bundle of tensors 
and tensor fields 


bundle of M: 

T*(M) - |J T P (M). 

P^M 


( 26 . 26 ) 


This is the dual space of T(M) at each point of M. We can now define the analogue 
of the vector field for the cotangent bundle. 

26*4*9. Definition. A differential one-form 6 on a subset U of a manifold M is a 
mapping 0:U ^ T*(M) such that 0( 戶） =9 P e T p (M). The collection of all 
one-forms on M is denoted by 

If 0 is a one-form and X is a vector field on M, then 0(X) is a real-valued 
function on M defined naturally by [0(X)](P) = (Bp)(Xp), The first factor on 
the RHS is a linear functional at P, and the second factor is a vector at P, So, the 
pairing of the two factors produces a real number. A prototypical one-form is the 
coordinate differential, dx l . 

Associated with a differentiable map 少 ：M — ■ 况 ， we defined a differential 
^ that mapped a tangent space of M to a tangent space of N. The dual of 少 * 
(Definition 1.3.17) is denoted by and is called the pullback of It takes a one- 
form on iV to a one-form on M. In complete analogy to the case of vector fields ， 
d can be written in terms of the basis {dx 1 }: 6 — Oidx 1 , Here Gi, the components 
of G, are real-valued functions on M. 

With the vector spaces 7p{M) and T^(M) at our disposal, we can construct 
various kinds of tensors at each point/*. The union of all these tensors is a manifold, 
and a tensor field can be defined as usual. Thus, we have the following definition. 

26A10. Definition* Let7p(M) andT^(M) be the tangent and cotangent spaces 
at P e M. Then the set of tensors of type (r, s) on 7p(M) is denoted byT" s p (M). 
The bundle of tensors of type (r, s) over M ， denoted by T/ (M), is 

r ； (M) = |J r StP (M). 

P^M 

A tensor field T of type (r, s) over a subset U of M is a mapping T \ U — (M) 

such that T{P) = lp = T|p e 7^ p (M). 

In particular, 7^ (Af) is the set of real-valued functions on Af, Tq 1 (M) = T{M), 
and T^(M) = Furthermore, since T is a multilinear map, the parentheses 

are normally reserved for vectors and their duals, and as indicated in Definition 
26.4.10, the value of T at P g M is written as Tp or T|p. The reader may check 
that the map 

T: X*(M) x x X*(M) x X(M) x • • • x X(M) r 0 °(M) 

1 、 V ^ 

r times s times 


defined by 

[Tfo; 1 ， … ， o/ ， Vl ， … ， y s )](P) = THw 1 1 戶， … ， u) r \ P , \ S \ P ) 



26.4 TENSOR FIELDS ON MANIFOLDS 787 


A crucial property of has the property that 
tensors 

t (…， + g e\...) = /t(. • • ， W ■,) + g T(. “， e j 、 

/外 +gUh ■■•) = /T(...,Vjt,...) +gT(...,Uit, - 

(26.27) 


for any two functions f and g on M. Thus, 7 


26.4.11. Box. A tensor is linear in vector fields and l-forms, even when the 
coefficients of linear expansion are functions. 


The components of T with respect to coordinates x l are the m r+5 real-valued 
functions 

T jlf 2 [[j s = T(dx' 祝 2 ,… ， dx lr , 

If tensor fields are to be of any use, we must be able to differentiate them. We 
shall consider three types of derivatives with different applications. We study one 
of them here, another in the next section, and the third in Chapter 28- 

Derivatives can be defined only for objects that can be added (really, sub¬ 
tracted). For functions of a single (real or complex) variable, this is done almost 
subconsciously: We take the difference between the values of the function at two 
nearby points and divide by the length of the interval between the two points. 
We extended this definition to operators in Chapter 2 with practically no change. 
For functions of more than one variable, one chooses a direction (a vector) and 
considers change in the function along that direction. This leads to the concept of 
directional derivative, or partial derivative when the vector happens to be along 
one of the axes. 

In all the above cases, the objects being differentiated reside in the same space: 
f(t) and f(t + At) are both real (complex) numbers; H(0 and + AO both 
belong to ^(V). When we try to define derivatives of tensor fields, however, we 
run immediately into trouble: T p andT p/ cannot be compared because they belong 
difficulty associated to two different spaces, one to ^ p (Af) and the other to (AO. To make com- 
with differentiating parisons, we need first to establish a “connection” between the two spaces. This 
tensors connec tion has to be a vector space isomorphism so that there is one and only one 
vector in the second space that is to be compared with a given vector in the first 
space. The problem is that there are infinitely many isomorphisms between any 
given two vector spaces. No “natural” isomorphism exists between T s p {M) and 
T s pf (M); thus the diversity of tensor “derivativesWe narrow down this diversity 


7 In mathematical jargon, X(M) and X*(M) are called modules over the (ring of) real-valued functions on M. Rings are a 
generalization of the real numbers (field of real numbers) whose elements have all the properties of a field except that they may 
have no inverse. A module over a field is a vector space. 



788 26. ANALYSIS OFTENSORS 


Lie derivative of 
tensor fields with 
respect of a vector 
field 


by choosing a specific vector at X’ p (M) and seeking a natural way of defining the 
derivative along that vector by associating a “natural” isomorphism corresponding 
to the vector. There are a few methods of doing this. We describe one of them here. 
First, let us see what happens to tensor fields under a dififeomorphism of M onto 
itself. Let : M — M be such a diffeomoiphism. The differential F^p of this 
diffeomorphism is an isomorphism of 7p(M) and This isomorphism 

induces an isomorphism of the vector spaces p (M) and 尸 ( 戶） (^) 一 also de¬ 
noted by F^p — by Equation (25.7). Let us denote by a map of T(M) onto 
T(M) whose restriction to 7p(M) is F^p. If T is a tensor field on M ， then 7^(T) 
is also a tensor field, whose value at F(Q) is obtained by letting F^q acton T(Q): 

[^(T)](F(e)) = F +Q (T(e)), 

or, letting P — F(Q) or Q = F~ l (P), 

[F^(J)](P) = F„ f - Hp) (J(F^(P))). (26.28) 

Now, let X be a vector field and P € M. The flow of X at 尸 defines a local 
diffeomorphism F t : U ^ Ft(U) with P e U. The differential F t ^ of this 
diffeomorphism is an isomorphism ofTp (M) and 了朽 ( 尸 ) (M). As discussed above, 
this isomorphism induces an isomorphism of the vectOT space (M) onto itself. 
The derivative we are after is defined by comparing a tensor field evaluated at P 
with the image of the same tensor field under the isomorphism The following 
definition makes this procedure more precise. 

26.4«12. Definition, Let P g M t X G X(M )， and F t the flow of X defined in a 
neighborhood of P. The Lie derivative of a tensor field J at P with respect to X 
is denoted by (LxT)p and defined by 

1 d 

(L\T) P = lim - -Tp] = —• (26.29) 

t at t=o 

Let us calculate the derivative in Equation (26.29) at an arbitrary value of t. 
For this purpose, let Q = F t (P). Then 

J t F t^ J F t {P) = ^ Q ^- t [^l A t)jF t+At (P) - F- l T Ft(P) ] 

= F" 1 Ihn^ — [F^ { \j Ft+At (P) - 丁朽 ( 户 )] 

= A/^o [d TF △ ，⑼ — T <3 ] 三 A: 1 ( l xT)q . 


Since Q is arbitrary, we can remove it from the equation and write, as the gener¬ 
alization of Equation (26.29), 




(26.30) 



26.4 TENSOR FIELDS ON MANIFOLDS 789 


Lie derivative is 
commutator 


Properties of Lie 
derivative 


An important special case of the definition above is the Lie derivative of a 
vector field with respect to another. Let X， Y € X(M). To evaluate the RHS of 
(26.29), we apply the first term in the brackets to an arbitrary function /, 

l^F t (P)Uf)= Y F/(P) (/ o Fr 1 ) - Y(/o Fr l )\ Ft{P) 

=Y(/o F_ f )| Ff (P) - Y(/- ?X(/))| F “ P) 

=( y /)6(p) Y(X(/»| Ft(j p) 

=(Y/) P + f[X(Y/)] 尸一 t{\Y(Xf)] P + f[X(Y(X/))] P } 

=Yp(/) + tXp o Yp(f) — tYp o Xp(/) 

= Yp(f) + t[Xp,Y P ] (f)=Y P (f) + f[X,Y] P (/). 

The first equality on the first line follows from (26.9), the second equality from the 
meaning of Y/? f (p); the second equality on the second line and the fourth line follow 
from (26.25). Finally, the fifth line follows if we ignore the t 2 term. Therefore, 

(L x Y)p (/) = lim \ [F~ x Y Ft{P) - Y P ] (/) 
f->0 t 

=lim - {t [X, Y] P } (/) = [X ， Y] P (/). 

/ ~3^0 t 

Since this is true for all P and /, we get 

^xY = [X, Y]. (26.31) 

This and other properties of the Lie derivative are summarized in the following 
proposition. 

26A13. Proposition. Let~t e T/ (M) and 7’ be arbitrary tensor fields andX a 
given vector field. Then 

1. Lx satisfies a derivation property in the algebra of tensors, i.e. t 

L X (T ® T’) = (L X T) ® T' + T (g) (LxT’). 

2. L\ is type-preserving, Le” LxT is a tensor field of type (r, s). 

3. L\ commutes with the operation of contraction of tensor fields; in particular, 
in combination with property 1, we have 

Lx (0,Y) = (Lx0,Y) + {»,LxT) 

4. L x f — ILf for every function f. 

5. L X Y [X, Y]for every vector field Y. 



790 26. ANALYSIS OFTENSORS 


Proof. Except for the last property, which we demonstrated above, the rest follow 
directly from definitions and simple manipulations. The details are left as exercises. 
— □ 


Remember Einstein’s 
summation 
convention! 


Although the Lie derivative of a vector field is nicely given in terms of commu¬ 
tators, no such simple relation exists for the Lie derivative of a 1-forni. However, if 
we work in a given coordinate frame, then a useful expression for the Lie derivative 
of a 1-form can be obtained. Applying Lx to (0, X )， we obtain 

Lx (0, Y) = {L X 0, Y) + (0, i X Y> = (L X 0, Y) + (6, \X, Y]). 


=X({0,Y» 


(26.32) 


In particular, if Y = 3/ and we write X = X J dj, 0 = Ojdx^ , then the LHS 
becomes X(0/) = 3jOi, and the RHS can be written as 


(LxO)i + (0, [X^j,di]). 

- - v - 一 

-(diXJ)dj 

It follows that 

L x O= (Lxd^dx 1 = (X j djOi + 0jdiX J )dx\ (26.33) 

We give two other useful properties of the Lie derivative applicable to all 
tensors. From the Jacobi identity one can readily deduce that 


L[x ， Y]Z = LxLyZ - Ly^xZ. 

Similarly, Equation (26.32) yields 
L[x ， Y]® = LxLyO - LyLxO 

Putting these two equations together, recalling that a general tensor is a linear 
combination of tensor products of vectors and 1-forms, and that the Lie derivative 
obeys the product rule of differentiation, we obtain 


L[x ， Y]T = Lx^yT - LyLxT (26.34) 

for any tensor field T. Furthermore, Equation (26.33) and the linearity of the Lie 
bracket imply that Z^x+jSY = ocLx + ^L\ when acting on vectors and 1-forms. 
It follows by the same argument as above that 

^aX+/5YT = aLxT + ^LyT V T € (26.35) 




26.5 EXTERIOR CALCULUS 791 


differential forms, or 
simply forms, 
defined 


defining pullback for 
differential forms 


exterior derivative 
and its antiderivation 
property 


26*5 Exterior Calculus 

Skew-symmetric tensors are of special importance to applications. We studied 
these tensors in their algebraic format in the last chapter. Let us now investigate 
them as they reside on manifolds. 

26.5.1. Definition. Let M be a manifold and Q a point of M. Let Aq (M) denote 
the space of all antisymmetric tensors of rank p over the tangent space at Q. Let 
A P (M) be the union of all A P q[M) for all Q e M, A differential p-form uj is 

a mapping u) : U — A P (M) such that uj(Q) e Ag(M) where U is, as usual, 
a subset of M • To emphasize their domain of definition, we sometimes use the 
notation A p (U), 

Since {dx 1 }^ is a basis for Tq(M) at every Q e M, {dx 11 八…八 dx l P] is 
a basis for the /?-forms. All the algebraic properties established in the last chapter 
apply to these p-forms at every point Q e M. 

The concept of a pullback has been mentioned a number of times in connection 
with linear maps. The most frequent use of pullbacks takes place in conjunction 
with the /7-forms. 

26.5.2. Definition* Let M and N be manifolds and ♦ M — N a differentiable 
map. The pullback map on p■forms is the map xjr* : A P (N) A P (M) defined 
by 

…， Xp) = ， … ， ir^x p ) for pe A^(iV). 

For p = 0, i.e” for functions on M v ^r*u; = u; o 
It can be shown that 

八? 7 ) = a o </>)* = 0* o i/r*. (26.36) 

Since w varies from point to point, we can define its derivatives. Recall that 
Tq (M) is the collection of real-valued functions on M. Since the dual ofRisR, we 
conclude that A°(M), the collection of zero-forms, is the union of all real-valued 
functions on M. Also recall that if / is a zero-form, then df 9 the differential of /， is 
a one-form. Thus, the differential operator d creates a one-form from a zero-form. 
The fact that this can be generalized to p-forms is the subject of the next theorem 
(foraproof, see [Abra 88 ,pp. 111-112]). 

26.5.3. Theorem. For each point Q of M, there exists a neighborhood U and a 
unique operator d : A P (U) A ^ +1 {U), called the exterior derivative operator, 
such that for any u; e A p (U) and r) e A q (U) y 

L d(co rj) = dtodr} ifq = p; otherwise the sum is not defined. 

2. d(u) A 17 ) = {d(J) A r/ H- A {dr]); this is called the antiderivation 

property of d with respect to the wedge product. 



792 26. ANALYSIS OF TENSORS 


The homogeneous 
Maxwell’s equations 
are written in terms 
of differential forms. 


3. d(duj) = 0 for any differential form a?; stated differently, dod 

4. df = (dtf)dx 1 for any real-valued function f. 




0. 


5. d is natural with respect to pullback; that is, d\j o ^ a for any 
differentiable map ' M — N. Here du (d^) is the exterior derivative 
operating on differential forms ofM (N). 

26«5*4. Example. Let M = 肢 3 and a? = a(dx^ a 1-form on M. The exterior derivative 
ofwis 


dfjj == (da{) A dx^ = (djaidx-^) A dx^ 


j<i 


3iaj)dx^ A dx l . 


We see that the components of du> are the components of V x A where A = (a\ t a 2 , 

It follows that the curl of a vector in E 3 is the exterior derivative of the 1-form constructed 
out of the components of the vector. 匾 

26.5.5. Example. In relativistic electromagnetic theory the electric and magnetic fields 
are combined to form ihe electromagnetic field tensor. This is a skew-symmetric tensor field 
of rank 2, which can be written as 8 


(26.37) 


F = —E x dt /\dx — Eydt Ady — E z dt A dz 
+ B z dx Ady — Bydx Adz-\- B x dy A dz, 

where t is the time coordinate and the units are such that c, the velocity of light, is equal to 


Let us take the exterior derivative of F. In the process, we use df — (3 / f)dx\ d(dx l A 
dx J ) = 0, and in dEf or dBj we include only the terms that give a nonzero contribution: 

dF= - (^ dy + ^ dz ) AdtAdx - (^ dx+ ^ dz ) ^dt/Kdy 

-(香 ☆ + 尝办 ) AdtAdz+ + ^~ dz ) AdxAd y 


一 Adx Adz + 心) A 办 A 办 . 

Collecting all similar terms and taking into account changes of sign due to the antisymnw 
of the exterior products gives 


3B 


SB X 


dB y 


dF;(& 

\ dx 

/dE z 

+ (l7 


ay 

3Ey 


dB z 

~dt~ 

BB X 


、 dt A dx /\ 


dt Adx A dz 


办 +( n 眢） 


_( V X E + 尝 ) J A A 心 A 办 + [(V X E + 菩乂 

[(V x E + 尝 ) ^dt /\ dy A dz + (y • Wjdx Ady Adz 


Sz 

dt /\ dz dx 


8 Note how in the wedge product, the first factor has a lower index (is an “earlier” coordinate) than the second factor. If this 
restriction is to be removed, we need to introduce a factor of ^ for each component (see Example 26.5.11). 




26.5 EXTERIOR CALCULUS 7fl3 


interior product of a 
vector field and a 
differential form 


Each component of df vanishes because of Maxwell’s equations. M 

The example above shows that 


26.5*6. Box. The two homogeneous Maxwell's equations can be written as 
dF = 0, where F is defined by Equation (26.37). 


The exterior derivative is a very useful concept in the theory of differential 
forms, as illustrated in the preceding example. However, that is not the only dif ， 
ferentiation available to the differential forms. We have already defined the Lie 
derivative for arbitrary tensors. Since differential forms are (antisymmetrized) lin¬ 
ear combinations of covariant tensors, Lie differentiation is defined for them as 
well. In fact, since differential forms have no contravariant parts, one uses the 
pullback map F* in the definition of the Lie derivative instead of F t ~ x : 

Lx^ = ( 巧 T 1 参泞 cj. (26.38) 

at 

The two derivatives defined so far have the following convenient property, 
whose proof is left as an exercise for the reader: 

26.5.7. Theorem* The exterior derivative d is natural with respect to L\ (or 
commutes with L\) for X G X(M); that is，d o Lx = L\od. 

In our definition of exterior product in the previous chapter, we assumed that 
the number of vectors was equal to the number of linear functionals taken from the 
dual space [see Equation (25.9)]. As a result of this complete pairing, we always 
ended up with a number. It is useful, however, to define an “incomplete” pairing 
in which the number of vectors and dual vectors are not the same. In particular, if 
we have a /7-form and a single vector field, then we can pair the vector field with 
one of the factors of the p-form to get a — l)-form. This process is important 
enough to warrant the following: 

26.5.8. Definition^ Let Xbea vector field anduja p-form on a manifold M. Then 
define ix : A p (M) A p ~ l (M) by 

, Xp — i) = c*^(X ， X ]_， ， _ _ ， X 卩一 i). 

Ifcj e A°(M), i.e” if is just a function, we set ixu) = 0. is called the 
interior product or contraction ofX and u. Another notation commonly used for 
i\(^ is Xju. 

Although no signature of “differentiation” appears on it does have such a 
property: 



794 26. ANALYSIS OFTENSORS 


Relation between d f 
乙 x ， and 玫 


analysis of the 
Lorentz force law in 
the language of 
forms 


26.5.9. Theorem. Let <o be a p-form and tj a q-form on a manifold M. Then，ix 
is an antiderivation with respect to the wedge product: 

ix(<o Ai/) = (i’ x o>) 八 ” + (-1)^ A O'xv)- 

Proof, The proof follows directly from Definitions 25.3.3 and 26.5.8, and is left 
for the reader. □ 

We have introduced three types of derivation on the algebra of differential 
forms: the exterior derivative, the Lie derivative, and the interior product. The fol¬ 
lowing theorem connects all three derivations in a most useful way (see A[Abra 88, 
pp. 115 - 116]): 

26.5*10. Theorem. Let <o € A p (M) t f € A°(M), and X e X(M). Let i\ : 
AP(M) d : AP(M) A^ +1 (M), and L x : A^(M) ^ AP(M) 

be the interior product, the exterior derivative，and the Lie derivative, respectively. 
Then 


i- ixdf = Lx/. 

2. L\ = ix°d d o ix. 

3. L /x^> = ~\~df a ix<o. 

If X = dj and <o = 叫山 … i p+t dx” 八 dx 12 A . ••八 dx l p +l 9 then the reader 
may verify that i\<o = X l coii' … i p dx n 八...八 dx l p. In particular, we have the 
useful formula 


ix(dx 11 A dx 12 A • • • A dx lp+l ) 

=X j d ^ ji A 心力 A … A dx^P 

J … Jp 

= xj (E … i p u P )) dxJlAdxj2 


(26.39) 


A … /\dxi p . 


26.5«11. Example. Let p = p a dx a be the momentum one-form and write the electro¬ 
magnetic field tensor as 9 F = jF a ^dx a A dx 钤 , where a and p run over the values 0, 1 ， 2, 
and 3 with 0 being the time index. Let 

变 ； (^L) dx ^ 

dr~\dx ) 

be the derivative of momentum with respect to the proper time, r. Also, let u = be 
the velocity four-vector of a charged particle. Then the Lorentz force law can be written 
simply as dp/dr— qF(u) s —^i u F, where q is the electric charge of the particle whose 
4-velocity is u. Note that F, a two-fonn, contracts witb u, a vector, to give a one-form on 


^The factor 士 is introduced here to avoid restricting the sum over a and p. 




26.5 EXTERIOR CALCULUS 795 


the RHS. Thus, both sides are of the same type. Let us write this equation in component 
form: 


^^~dx a = -q\F a pi u (dx a A dx^) = -\qF a ^{u y Sy^dx 11 ) 

=\qF^uy 8^4)^ 

= ^qF^iu^dx 01 — u a dx 卩) 

= \q(F^ - F^ a )u^dx a = (qF a ^)dx a . 


(26.40) 


Equating the components on both sides, we get dp a /dr = qF a ^u^ t which maybe familiar 
to the reader. To make the equation even more familiar, consider the component« = 1, 


字 i =： qF\^ = q[Fiou 0 -h F\2 U ^ + ^13 w3 ]» 


(26.41) 


and recall that u 01 = dx a /dr, where 

(dr) 2 = (fit) 1 - (dx 1 ) 2 - (dx 2 ) 2 - (dx 3 ) 2 = - v 2 ) 

and v = (dx^/dt, dx 2 /dt, dx^/dt) is the 3-velocity of the particle. Since j ： 0 = t, we get 


dr ~ y/\- v 2' 


dx 1 Vi 

dT — y2 


for i — 1,2, 3. 


Substituting this in (26.41) and remembering that Fjq = ™^oi = 五 1 ， ^12 = 万 3 ,如迂 
F 13 = —F 31 = —Bi ，we obtain 


dp\ 

dtyj\ — W 2 


£ l 7 T ^ + 53 7 r ^- 52 7 r^J 


胥 —5 3 一 , 3 屻 ] = [,(E + V xB)h. 

The other components are obtained similarly. Thus, in vector form we have 


d i- 


q(E + v x B), 


where p now represents the 3-momentum of the particle. This is the Lorentz force law for 
electromagnetism in its familiar form. Again, note the simplification offered by the language 
of forms. ® 

A combination that is very useful is that of the exterior derivative and the 
Hodge star operator. Recall that the latter is defined by 


*(^ A -.. A ^) = 么办 W …八此 ' 


where m is the dimension of the manifold. 


(26.42) 



7 明 26. ANALYSIS OF TENSORS 


26.5.12. Example. Let us calculate *F and rf(*F) where F = ^F a ^dx a A dx^ is the 
electromagnetic field tensor. We have 

*F = 卩 dx a /\ dx^) = * (dx a a dx^) = A dx v 

and 

d(*F) = Adx v ) = \^iv¥ Ci ^^dx y Adx^ 1 A dx v , 

where Fafi y y = . We can now use the components Fjq = Ej, F \2 = B^, F 13 = 

—B 2 , and F 23 = B\ to write d(*F) in terms of E and B. After a long but straightforward 
calculation，we obtain 


「 /3E \ 

綱=[(瓦七吹 


dt 八 dx A dy + 


dE 

~3t 


V x 




dt Adz Adx 



[(f-VxB) 

A dt / x- 


dt A dy A dz ^ (y • E) dx A dy A dz. 


(26.43) 


The inhomogeneous pair of Maxwell's equations is 


Maxwell’s 
inhomogeneous 
equations in the 
language of forms 


V x B = — -h4^J, V-E = 4jrp, (26.44) 

ot 

where p and J are charge and current densities, respectively. We can put these two densities 
together to form a four-current one-form with p as the zeroth component: J = J a dx a • 
Thus, 

* J = Jc[(^dx a ) = J a -^€^ Vf) dx^ A dx v A dx p 

=Jodx Ady AdzJxdt Ady /\dz + Jydt Adz Adx + J z dt Adx Ady 

(26.45) 

=pdx Ady Adz — J x dt Ady Adz J y dt Adz Adx — J z dt Adx A dy, 


where we have used the facts that p = = J 0 and J = (J x , jy, J z )= 

—(Jx ， Jy ， Jz)* ■ 

Comparing Equations (26.43), (26.44), and (26.45), we note that 


26.5.13. Box. In the language of forms，the inhomogeneous pair of 
Maxwell's equations has the simple appearance d(*F) = 


Problem 26.15 shows that the relation d 2 u = 0 is equivalent — at least in E 3 — 
to the vanishing of the curl of the gradient and the divergence of the curl. It is 
customary in physics to try to go backwards as well, that is，given that V x E = 0, 
to assume that E = ▽/ for some function /. Similarly, we want to believe that 
V ■ B = 0 implies that B = V x A. 





26.5 EXTERIOR CALCULUS 797 


closed and exact 
forms 


regions that are 
contractable to a 
point 


converse of the 
Poincare lemma 


gauge invariance in 
the language of 
forms 


What is the analogue of the above statement for a general p-form? A form 
u that satisfies du? = 0 is called a dosed form. An exact form is one that can 
be written as the exterior derivative of another form. Thus, every exact form is 
automatically closed. This is the Poincare lemma. The converse of this lemma is 
true only if the region of definition of the form is topologically simple, as explained 
in the following. 

Consider a p-form u? defined on a region C/ of a manifold M. If all closed 
curves in U can be shrunk to a point in U, we say that U is contractable to a 
point. If u? is not defined for a point P on M, then any U that contains P is not 
contractable to a point. We can now state the converse of the Poincare lemma (for 
a proof, see [Bish 80, p. 175]): 

26.5.14. Theorem, (converse of the Poincare lemma) Let U be a region in a 
manifold M, such that U is contractable to a point. Let u be a p-form on U such 
that duj = 0. Then there exists a(p — l)-form r} on XJ such that w = dr). 

26.5.15. Example. The electromagnetic field tensor F = ^ F a ^dx a a dx^ is a two-form 
that satisfies JF = 0. The converse of the Poincare lemma says that if F is well behaved in 
a region U of R 4 , then there must exist a one-form 7) such that F = dr]. 

Let us write this one-form in terms of coordinates as Then dr }= 

A a ^dx^ A dx a , and we have 

^F a ^dx a Adx^ = A dx^ — + A a ^)dx a A dx^ = 0. 

Since dx a A dx 泛 are linearly independent and their coefficients are antisymmetric y each of 
the latter must vanish. Thus, 

BAq dA a 

F ^= — W 

The four-vector is simply the four-potential of relativistic electromagnetic theory. M 

Note that the (p — l)-form of Theorem 26.5.14 is not unique. In fact, if a is 
any (p ~ 2 )-form, then a; can be written as 

U = d(r} + doc) 

because a) is identical to zero. This freedom of choice in selecting 77 is called 
gauge invariance, and its generalization plays an important role in the physics of 
fundamental interactions. 


Jules Henri Poincare (1854-1912): The development of mathematics in the nineteenth 
century began under the shadow of a giant, Carl Friedrich Gauss; it ended with the domina¬ 
tion by a genius of similar magnitude, Henri Poincare. Both were universal mathematicians 
in the supreme sense, and both made important contributions to astronomy and mathemat¬ 
ical physics. If Poincare’s discoveries in number theory do not equal those of Gauss, his 
achievements in the theory of functions are at least on the same level — even when one takes 


798 26. ANALYSIS OF TENSORS 


into account the theory of elliptic and modular functions, which must be credited to Gauss 
and which represents in that field his most important discovery, although it was not pub¬ 
lished during his lifetime. If Gauss was the initiator in the theory of differentiable manifolds, 
Poincare played the same role in algebraic topology. Finally, Poincare remains the most 
important figure in the theory of differential equations and the mathematician who after 
Newton did the most remarkable work in celestial mechanics. Both Gauss and Poincare had 
very few students and liked to work alone; but the similarity ends there. Where Gauss was 
very reluctant to publish his discoveries, Poincare^ list of papers approaches five hundred, 
which does not include the many books and lecture notes he published as a result of his 
teaching at the Sorbonne. 

Poincare^ parents both belonged to the upper middle 
class, and both their families had lived in Lorraine for sev¬ 
eral generations. His paternal grandfather had two sons: Leon, 

Henri’s father, was a physician and a professor of medicine at 
the University of Nancy; Antoine had studied at the Ecole Poly- 
technique and rose to high rank in the engineering coips. One 
of Antoine’s sons, Raymond, was several times prime minister 
and was president of the French Republic during World War I; 
the other son, Lucien, occupied high administrative functions 
in the university. Poincare's mathematical ability became ap¬ 
parent while he was still a student in the lycee. He won first 
prizes in the concours ginial (a competition among students from ail French lycees) and 
in 1873 entered the ficole Polytechnique at the top of his class; his professor at Nancy is 
said to have referred to him as a “monster of mathematics.” After graduation, he followed 
courses in engineering at the Ecole des Mines and worked briefly as an engineer while 
writing his thesis for the doctorate in mathematics which he obtained in 1879. Shortly af¬ 
terward he started teaching at the University of Caen, and in 1881 he became a professor at 
the University of Paris, where he taught until his untimely death in 1912. At the early age 
of thirty-three he was elected to the Academie des Sciences and in 1908 to the Academie 
Frai^aise. He was also the recipient of innumerable prizes and honors both in France and 
abroad. 

Before he was thirty years of age, Poincare became world famous with his epoch-making 
discovery of the “automorphic functions” of one complex variable (or, as he called them, 
the “fuchsian” and “kleinean” functions). Much has been written on the “competition” 
between C. F. Klein and Poincare in the discovery of automorphic functions. However, 
Poincare's ignorance of the mathematical literature when he started his researches is almost 
unbelievable. He hardly knew anything on the subject beyond Hermite’s work on the modular 
functions; he certainly had never read Riemann, and by his own account had not even heard 
of the “Dirichlet principle,** which he was to use in such imaginative fashion a few years 
later. Nevertheless, Poincare^ idea of associating a fundamental domain to any fuchsian 
group does not seem to have occurred to Klein, nor did the idea of “using” non-Euclidean 
geometry, which is never mentioned in his papers on modular functions up to 1880. 

Poincare was one of the few mathematicians of his time who understood and admired 
the work of Lie and his continuators on “continuous groups，’’ and in particular the only 
mathematician who in the early 1900s realized the depth and scope of E. Carlan^s papers. 
In 1899 Poincare proved what is now called the Poincari^Birkhoff-Witt theorem which 
has become fundamental in the modem theory of Lie algebras. The theory of differential 










26.5 EXTERIOR CALCULUS 799 


equations and its applications to dynamics was clearly at the center of Poincare's mathemat¬ 
ical thought; from his first (1878) to his last (1912) paper, he attacked the theory from all 
possible angles and very seldom let a year pass without publishing a paper on the subject. 
The most extraordinary production of Poincare^, also dating from his prodigious period of 
creativity (1880-1883) (reminding us of Gauss’s Tagebuch of 1797-1801), is the qualitative 
theory of differential equations. It is one of the few examples of a mathematical theory that 
sprang apparently out of nowhere and that almost immediately reached perfection in the 
hands of its creator. Everything was new in the first two of the four big papers that Poincare 
published on the subject between 1880 and 1886. 

For more than twenty years Poincare lectured at the Sorbonne on mathematical physics; 
he gave himself to that task with his characteristic thoroughness and energy, with the result 
that he became an expert in practically all parts of theoretical physics, and published more 
than seventy papers and books on the most varied subjects, with a predilection for the 
theories of light and of electromagnetic waves. On two occasions he played an important 
part in the development of the new ideas and discoveries that revolutionized physics at the 
end of the nineteenth century. His remark on the possible connection between X-rays and the 
phenomenon of phosphorescence was the starting point of H. Becquerel’s experiments that 
led him to the discovery of radioactivity. On the other hand, Poincare was active from 1899 
on in the discussions concerning Lorentz’s theory of the electron; Poincare was the first 
to observe that the Lorentz transfonnations form a group; and many physicists consider 
that Poincare shares with Lorentz and Einstein the credit for the invention of the special 
theory of relativity. The main leitmotiv of Poincare^ mathematical work is clearly the 
idea of “continuity ”： Whenever he attacks a problem in analysis, we almost immediately 
see him investigating what happens when the conditions of the problem are allowed to 
vary continuously. He was therefore bound to encounter at every turn what we now call 
topological problems. He himself said in 1901, “Every problem I had attacked led me to 
Analysis situs'* particularly the researches on differential equations and on the periods of 
multiple integrals. Starting in 1894 he inaugurated in a remarkable series of six papers — 
written during a period of ten years — the modem methods of algebraic topology. 

Whereas Poincare has been accused of being too conservative in physics，he certainly 
was very open-minded regarding new mathematical ideas. The quotations in his papers show 
that he read extensively, if not systematically, and was aware of all the latest developments 
in practically every branch of mathematics. He was probably the first mathematician to 
use Cantons theory of sets in analysis. Up to a certain point, he also looked with favor on 
the axiomatic trend in mathematics, as it was developing toward the end of the nineteenth 
century, and he praised Hilbert’s Grundlagen der Geometrie. However, he obviously had 
a blind spot regarding the foimalization of mathematics, and poked fun repeatedly at the 
efforts of the disciples of Peano and Russell in that direction; but, somewhat paradoxically, 
his criticism of the early attempts of Hilbert was probably the starting point of some of the 
most fruitful of the later developments of metamathematics. Poincare stressed that Hilbert’s 
point of view of defining objects by a system of axioms was admissible only if one could 
prove a priori that such a system did not imply contradiction, and it is well known that the 
proof of noncontradiction was the main goal of the theory that Hilbert founded after 1920. 
Poincare seems to have been convinced that such attempts were hopeless, and K. Godel’s 
theorem proved him right. 



800 26. ANALYSIS OFTENSORS 


integration of 
differential forms in 

M n 


This discussion is 
analogous to our 
discussion of 
orientation in vector 
spaces (see Section 
25.3.2). 


orientable manifolds 


26.5.1 Integration on Manifolds 

We mentioned in Chapter 25 that certain exterior products are interpreted as volume 
elements. We now exploit this notion and define integration on manifolds. Starting 
with ， considered as a manifold, we define the integral of an w-forma;as follows. 
Choose a coordinate system {x l }^ =1 mR n , write a? = fdx 1 八.•.八 dx n , and define 
the integral of the n-form as % 


LO 






/(x 1 . x tl )dx 1 .. ,dx n . 


where to avoid dealing with infinities, one assumes that/ vanishes outside a finite 
region. The second symbol in the lower part of the integral sign indicates the 
variables of integration. Let us now change the coordinates, say to }】 =1 . Using 
Equation (26.17), which gives the transforniation rule for 1-forms when changing 
coordinates, and Equation (25.13), which defines the determinant in terms of n- 
forms, we obtain 

r 

( Q x i \ 

办 1 八…八 dy n , 

沪 • 

where / is now understood to be a function of the ;y ’s through the x ’s. So, in terms 
of the new coordinates, the integral becomes 




J R n 


/(/(y) ， … , x n (y))dGt (备) dy 1 八…八 dy\ 


If we had the absolute value of the Jacobian in the integral, the two sides would 
be equal. So, all we can say at this point is 


I W = zb / LO. 

JU n ,x 


We therefore distinguish between two kinds of coordinate transformations: If the 
Jacobian determinant is positive, we say that the coordinate transformation is 
orientation preserving. Otherwise, the transformation is called orientation re¬ 
versing. 

Our ability to integrate functions on W 1 depends crucially on the fact that 
volume elements do not change sign at any point of R n . If this were not so, we 
could find a finite (albeit small) region of space — in the vicinity of the point at 
which the volume element changes sign — whose volume would be zero. This 
property of W 1 is the content of the following: 

26«5.16. Definition. A manifold M of dimension n is called orientable if it has a 
nowhere vanishing n-form. 




26.6 SYMPLECTIC GEOMETRY 801 


Any two nonvanishing n-forms u and a/ on an orientable manifold are related 
by a nowhere-vanishing function: lo 1 = fu). Clearly, / has to be either positive or 
negative everywhere, w and o/ are said to be equivalent if / is positive. Thus, the 
nonvanishing n-forms on an orientable manifold fall into two classes, all members 
of each class being equivalent to one another, and a member of one class being 
related to a member of the other class via a negative function. Each class is called 
an orientation on M. 

Given an orientation, an w-form cj, and a chart {U a , 0 a } on Af，we define 


f ^ = ^ f （ 26.46) 

JM a ' 


where ( 0 " 1 )* is the pullback of (j)~ l : R n ^ M, so that it maps w-forms on M to 
n-forms on is the restriction of cj to U a , and the sum over a is assumed to 

exist. This amounts to saying that the region in M on which u> is defined is finite, 
or that a? has compact support 

We note that the RHS of Equation (26.46) is an integration on R n that appears 
to depend on the choice of coordinate functions. However, it can be shown that 
the integral is independent of such choice. In practice, one chooses a coordinate 
patch and transfers the integration to where the process is familiar. 

“volume” of a If we choose a coordinate patch {x l }^ =1 and integrate dx l A--- Adx n according 

manifold to Equation (26.46), we obtain the “volume” of the manifold M. If M is compact, 
this volume will be finite. 10 


26.6 Symplectic Geometry 

Mechanics stimulated a great deal of dialogue between physics and mathematics 
in the latter part of the nineteenth century and the beginning of the twentieth. 
The branch of mathematics that benefited the most out of this dialog is the theory 
of differentiable manifolds, whose tribute back to mechanics has been the most 
beautiful language in which the latter can express itself, the language of symplectic 
geometry. All the discussion of symplectic vector spaces of the last chapter can 
be carried over to the tangent spaces of a manifold and patched together by the 
differentiable structure of the manifold. 


symplectic form, 
symplectic structure, 
and symplectic 
manifold defined 


26.6.1. Definition. A symplectic form (or a symplectic structure) on a manifold 
M is a nondegenerate y closed 2-form u on M. A symplectic manifold (M, ui) 
is a manifold M together with a symplectic form on M. We define the map 
b : X(M) — X*(M) by 


b(X) = X b = i x <^ = u; b (X) 


10 Recall from Chapter 16 that a subset of R n is compact iff it is closed and bounded. It is a good idea to keep this in mind as a 
paradigm of compact spaces. 


802 26. ANALYSIS OF TENSORS 


Darboux theorem 


symplectic charts, 
canonical 
coordinates, and 
canonical 
transformations 

Coordinate 
representation of 
sharp and flat maps 


and the map jj : X*(M) X(M) as the imerse of\>. 

Chapter 25 identified some special basis, the canonical basis, in which the 
symplectic form of a symplectic vector space took on a simple expression. The 
analogue of such a basis exists in a symplectic manifold. The reader should keep 
in mind that this existence is not automatic, because although one can find such 
bases at every point of the manifold, the smooth patching up of all such bases to 
cover the entire manifold is not trivial and is the content of the following important 
theorem, which we state without proof (see [Abra 85, p. 175]): 

26.6.2. Theorem. (Darboux) Suppose Loisa2 -form on a In -dimensional manifold 
M. Then do; = 0 if and only if there is a chart (U 9 <p) at each P e M such that 
(p(P) — 0 and 

n 

u = ^2/dx l Ady 1 , 

i=i 

where x l ,.,, ,x n , y l f ..., y n are coordinates on U. Furthermore, on such a chart, 
the volume element is 

fjito = dx l A • ■ * A dx n A dy l A - - • Ady n . 


26*63. Definition. The charts guaranteed by Darboux's theorem are called sym¬ 
plectic charts ， and the coordinates x l , y l are called canonical coordinates. If 
(M, u)) and {N, p) are symplectic manifolds，then a 6°° map f : M — N is 
called symplectic t or a canonical transformation , if f*p = 

26.6.4. Example. In this example, we derive a formula that gives the action of and 
in terms of components of vectors and 1-forms in canonical coordinates. Let 




be a vector field. When acts on Z, it gives a 1-form, which we write as u; b (Z)= 
U^dx^ + Wkdy k . To find the unknowns and Wj^, we let both sides act on coordinate 
basis vectors. For the RHS, we get 


^dx h + W k dy k ) (£j) = u kdx k (^j) + w k dy h (^y) = Uj 

。 1 - V - * ' --- ， 

=$ =0 

I 

(U k dx k + W k dy k )[^j) = Wj. 



26.6 SYMPLECTIC GEOMETRY 8 的 


For the LHS, we obtain 

( xi i^ +Yi ^)] (^) =x， P (^7)] (i) 

= ziw ( a 7 , 点) + y * w ( 去， i ). 


w (占， = 

= g &( 古 K (士 (士 H 

=0 
3 

W (“) = _ w )( 忐，忐） 

=° =8 k j =sf 


It follows that 

P ( x ， a ? + y ， a 7 )](^ 7 ) = _W - 

Similarly, 

占 +yi $)] (^ i ) = xj . 


Therefore, 


J ( xi ij +Yi ^)=- Yjdxj+xJdyj ^ 


(26.47) 


where a summation over repeated indices is understood. 

If we multiply both sides of this equation by ufi on the left and recall that (J^u^ = 1, 
we obtain the following equation for the action of io^: 


j{xUx^ + yU y ^) = y i -^[ - xi -^i- 


(26.48) 


Equations (26.47) and (26.48) are very useful in Hamiltonian mechanics. 




804 26. ANALYSIS OF TEWSORS 


from Lagrangian to 
Hamiltonian in the 
language of 
differential forms 


Our discussion of symplectic transformations of symplectic vector spaces 
showed that such maps are necessarily isomorphisms. Applied to the present situa¬ 
tion, this means that if f : M — N is symplectic, then /* : 7p(M) -> 了 /(p)(A0 
is an isomorphism. Theorem 26.3.2, the inverse mapping theorem, now gives the 
following theorem. 

26.6.5. Theorem. Iff : M N is symplectic，then it is a local diffeomorphism. 

26.6.6. Example. Hamiltonian mechanics takes place in the phase space of a system. 
The phase space is derived from the configuration space as follows. Let (^i,. .., q n ) be the 
generalized coordinates of a mechanical system. They describe an ^-dimensional manifold 
N. The dynamics of the system is described by the (time-independent) Lagrangian L, which 
is a function of ,q l ). But q l are the components of a vector at (^i ,..., q n ) [see Equation 
(26.13) and replace y l with x 1 ]. Thus, in the language of manifold theory, a Lagrangian is 
a function on the tangent bundle, L : T (iV) — 

Tbe Hamiltonian is obtained from the Lagrangian by a Legendre transformation: H = 
Z!?=l Pii 1 ~ The first term can be thought of as a pairing of an element of the tangent 
space with its dual. In fact, if P has coordinates ( 仍 ， • • • ， q n ), then q = q l d( e 7p(N) 
(with the Einstein summation convention enforced), and if we pair this with the dual vector 
pjdx^ e Tp (N), we obtain the first term in the definition of the Hamiltonian. The effect 

of the Legendre transfonnation is to replace q l by pi as the second set of independent 
variables. This has the effect of replacing T{N) with T*(N). Thus 


26.6.7. Box. The manifold of Hamiltonian dynamics, or the phase space, is T*(N), 
* 


symplectic 2-form of 
T*(N) 


is nondegenerate, and therefore a symplectic form for T* (N). ■ 

The phase space，equipped with a symplectic form, turns into a geometric 
arena in which Hamiltonian mechanics unfolds. We saw in the above example that 
a Hamiltonian is a function on the phase space. More generally, if (M ， u;) is a 
symplectic manifold, a Hamiltonian H is a real-valued function ，H : M ^ R. 
Given a Hamiltonian, one can define a vector field as follows. Consider dH € 
r*(M).Fora symplectic manifold, there is a natural isomorphism between T*(M) 


with coordinates (q l t pf) on which the Hamiltonian H : T {N) K. is defined. 


T*(N) is 2n-dimensional; so it has the potential of becoming a symplectic manifold. 
In fact, it can be shown that 11 the 2-form suggested by Darboux’s theorem, 


u^^^dq 1 Adpi, 


(26.49) 


U Here, we are assuming that the mechanical system in question is nonsingular, by which is meant that there are precisely 
n independent pi *s. There are systems of considerable importance that happen to be singular. Such systems, among which are 
included all gauge theories such as the general theory of relativity, are called constrained systems and are characterized by the 
fact that uj is degenerate. Although of great interest and currently under intense study, we shall not discuss constrained systems 
in this book. 





26.6 SYMPLECTIC GEOMETRY 805 


Hamiltonian vector 
field and Hamiltonian 
systems defined 


conservation of 
energy in the 
language of 
symplectic geometry 


and r (M), namely,^. The unique vector field Xh associated with d H is the vector 
field we are after. 


26.6.8. Definition. Let (M, a?) be a symplectic manifold and H : M M. a 
real-valued function. The vector field 

X H = ^(dH) = {dHf 


is called the Hamiltonian vector field with energy function H. The triplet 
(M, u), Xh) is called a Hamiltonian system. 

The significance of the Hamiltonian vector field lies in its integral curve which 
turns out to be the path of evolution of the system in the phase space. This is shown 
in the following proposition. 

26.6*9. Proposition. If (q l ， … ， q n ， pi ， … ， p n ) are canonical coordinates for 
~so u) = dq l A dpi ― then, in these coordinates 


x — —— _ dH d ^ l^L. _ dH \ 
H ㈣ dxi dq i dpi ~ \dpi J dq l ) 


(26.50) 


Therefore ， {qit), p(t)) is an integral cuiye o/Xh iff Hamilton's equations hold: 


W dH d Pi dH ., 

a 1=1 ，…， 

Proof. The first part of the proposition follows from 


(26.51) 


7 U11 I un 

纽 = + ^ — dp“ 
dq opi 

from the definition of in terms of dH, and from Equation (26.48). The second 
part follows from the definition of integral curve and Equation (26.19). □ 


We called H the energy function; this is for good reason: 

26.6.10. Theorem. Let (M, w, Xh) be a Hamiltonian system andy(t)an integral 
curve o/Xh^ Then is constant in t. 

Proof, We show that the time-derivative of H(y(t)) is zero: 


=dH(y^) 

= dH(X H (ym 
= [uj\X H (y(tm (X/y(7(0)) 

= 0 


by Proposition 26.2.4 

by Equation (26.14) 
by definition of integral curve 

by definition of (y(0) 
by the definition of a; b 
because u is skew-symmetric 


26. ANALYSIS 01= TENSORS 


Theorem 26.6.10 is the statement of the conservation of energy. 

In the theoretical development of mechanics, canonical transformations play 
a central role. The following proposition shows that the flows of a Hamiltonian 
system are such transformations: 

flow of Hamiltonian 26.6.11. Proposition. Let (M, u, Xh) be a Hamiltonian system, and F t the flow 

vector field is o/Xh> Then for each t t = u, i.e., F t is symplectic. 

canonical 

transformation of p roo f m We have 
mechanics 

—= F*L Xh u: by Equation (26.38) 

Cii L 

=F* + by Theorem 26.5.10 

=F*(0 -H ddH) because du = 0 and ix^o = u) b (X) 

= 0 because J 2 = 0 

Thus, F*u is constant in t. But Fq = id. Therefore, F*u = u. □ 


Sir William Rowan Hamilton (1805-1865), the fourth of 
nine children, was mostly raised by an uncle, who quickly 
realized the extraordinary nature of his young nephew. By the 
age of five, Hamilton spoke Latin, Greek, and Hebrew, and by 
the age of nine had added more than a half dozen languages 
to that list. He was also quite famous for his skill at rapid 
calculation. Hamilton’s introduction to mathematics came at 
the age of 13, when he studied Clairaufs Algebra, a task made 
somewhat easier as Hamilton was fluent in French by this 
time. At age 15 he started studying Newton, whose Principia 
spawned an interest in astronomy that would provide a great 
influence in Hamilton’s early career. 

In 1822, at the age of 18, Hamilton entered Trinity College, Dublin, and in his first 
year he obtained the top mark in classics. He divided his studies equally between classics 
and mathematics and in his second year he received the top award in mathematical physics. 
Hamilton discovered an error in Laplace’s Mechanique cileste ，and as a result, he came to 
the attention of John Brinkley, the Astronomer Royal of Ireland, who said: 4 This young 
man, I do not say will be, but is, the first mathematician of his age.” While in his final year 
as an undergraduate, he presented a memoir entitled Theory of Systems of Rays to the Royal 
Irish Academy in which he planted the seeds of symplectic geometry. 

Hamilton’s personal life was marked at first by despondency. Rejected by a college 
friend’s sister, he became ill and nearly suicidal. He was rejected a few years later by another 
friend’s sister and wound up marrying a very timid woman prone to ill health. Hamilton’s 
own personality was much more energetic and humorous, and he easily acquired friends 
among the literati. His own attempts at poetry, which he himself fended, were generally 
considered quite poor. No less an authority than Wordsworth attempted to convince him 





26.6 SYMPLECTIC GEOMETRY 807 


Liouville’s theorem 


that his true calling was mathematics, not poetry. Nevertheless, Hamilton maintained close 
connection with the worlds of literature and philosophy, insisting that the ideas to be gleaned 
from them were integral parts of his life’s work. While Hamilton is best known in physics 
for his work in dynamics, more of his time was spent on studies in optics and the theory of 
quaternions. In optics, he derived a function of the initial and final coordinates of a ray and 
termed it the “characteristic function,” claiming that it contained “the whole of mathematical 
optics.” Interestingly, his approach shed no new light on the wave/corpuscular debate (being 
independent of which view was taken), another appearance of Hamilton’s quest for ultimate 
generality. 

In 1833 Hamilton published a study of 卩 ecmrs as ordered pairs. He used algebra to 
study dynamics 'mOna General Method in Dynamics in 1834. The theory of quaternions, 
on which he spent most of his time, grew from his dissatisfaction with the current state of 
the theoretical foundation of algebra. He was aware of the description of complex numbers 
as points in a plane and wondered if any other geometrical representation was possible or if 
there existed some hypercomplex number that could be represented by three-dimensional 
points in space. If the latter supposition were true, it would entail a natural algebraic rep¬ 
resentation of ordinary space. To his surprise, Hamilton found that in order to create a 
hypercomplex number algebra for which the modulus of a product equaled the product of 
the two moduli,/owr components were required~hence, quaternions. 

Hamilton felt that this discovery would revolutionize mathematical physics, and he 
spent the rest of his life working on quaternions, including publication of a book entitled 
Elements of Quaternions, which he estimated would be 400 pages long and take two years 
to write. The title suggests that Hamilton modeled his work on Euclid^s Elements and indeed 
this was the case. The book ended up double its intended length and took seven years to 
write. In fact, the final chapter was incomplete when Hamilton died, and the book was 
finally published with a preface by his son, William Edwin Hamilton. While quaternions 
themselves turned out to be of no such monumental importance, their appearance as the 
first noncommutative algebra opened the door for much research in this field, including 
much of vector and matrix analysis. (As a side note, the “del” operator, named later by 
Gibbs, was introduced by Hamilton in his papers on quaternions.) 

In dynamics, Hamilton extended his characteristic function from optics to the classical 
action for a system moving between two points in configuration space. A simple trans¬ 
formation of this function gives the quantity (the time integral of the Lagrangian) whose 
variation equals zero in what we now call Hamilton’s principle. Jacobi later simplified the 
application of Hamilton’s idea to mechanics, and it is the Hamilton—Jacobi equation that 
is most often used in such problems. Hamiltonian dynamics was rescued from what could 
have become historical obscurity with the advent of quantum mechanics, in which its close 
association with ideas in optics found fertile application in the wave mechanics of de Broglie 
and Schrodinger. Hamilton’s later life was unhappy, and he became addicted to alcohol. He 
died from a severe attack of gout shortly after receiving the news that he had been elected 
the first foreign member of the National Academy of Sciences of the USA. 


The celebrated Liouville’s theorem of mechanics, concerning the preservation 
of volume of the phase space, is a consequence of the proposition above: 

26*6.12. Corollary. (Liouville’s Theorem) F t preserves the phase volume 



808 26. ANALYSIS OF TENSORS 


Poisson brackets in 
the language of 
symplectic geometry 


26.6.13.Definition. Let (M, a?) be a symplectic manifold. Let f,g : M ^ R 
with X/ = (df)^ andX g = (dg ) 这 their corresponding Hamiltonian vector fields. 
The Poisson bracket of f and g is the function 

{/, g] = w(X/ ， X^) = ix s iXf(^ = -ix/iXgW. 

We can immediately obtain the familiar expression for the Poisson bracket of 
two functions. 


26.6.14. Proposition. In canonical coordinates (q 1 ,..q n , p \,..., p n ), we 
have 


f ’ g ~^ l \Sq i ^Pi ^PidqW 


In particular, 

w}= 0 ， iPhPj) =o , = s y 

Proof, From Equation (26.50)，we have 


o;(X/,X g )= 


(a 

\^Pi w 


a a 

Sq ( dpt' Bpj dqJ 


9 \ 

dq j 3 Pj / 




dpt dpj w’ dqj ) dpi dq j w 乂⑽ ’ 8pj J 

' - v - * ^ -- ^ 


_ u ( 9 _9_\ , / _A_ 

dq l dpj^\dpi' dqJ ) dq l dq j ^ V ^Pi' j 

' - - - ' v - 一 — 

=—<5y =0 

今 （a 一 a 、 

i^x xdq 1 ^Pi d Pi Sq 1 )' 


where we have assumed that u) = Ylk=i dq k 八 dpk. The other formulas follow 
immediately once we substitute pi orq 1 for / ot g. □ 


26.7 Problems 


26.1. Provide the details of the fact that , a finite-dimensional vector space V is a 
manifold of dimension dim V. 




26.7 PROBLEMS 809 


26.2. Choose a different curve y : M M 2 whose tangent atu = Ois still (a Xl a y ) 

of Example 26.2.2. For instance, you may choose 

y( u ) = (士 (w + I)' 去 (m - l) 3 ) ■ 

Show that this curve gives the same relation between partials and unit vectors as 
obtained in that example. Can you find another curve doing the same job? 

26.3. For every t € Tp(M) and every constant function c e show that 

t(c) = 0. Hint: Use both parts of Definition 26.2.3 on the two functions / = c 
and g = 1. 

26.4. Find the coordinate vector field of Example 26.2.10. 

26*5. Use the procedure of Example 26.2.10 to find a coordinate frame for S 2 
corresponding to the stereographic projection charts (See Example 26.1,12). 

26.6* Let (x f ) and (〆）be coordinate systems on a subset t/ofa manifold M. Let 
X 1 and Y l be the components of a vector field with respect to the two coordinate 
systems. Show that 7* = X^dy^dx^ 

26.7. Show that if f : M — N is 及 local diffeomorphism 3i P g M, then 

少 * 户： 7p(M) ^(P)(N) is a vector space isomorphism. 

26.8. Let X be a vector field on M and i/r : M ^ N o. differentiable map. Then 
for any function / on N, [#*X](/) is a function on N. Show that 

x ( / 0 VO = {[^*X](/)} o f. 

26.9. Verify that the vector field X = ―》、 + xd y has an integral curve through 
(xo, yo) given by 

x = xq cos / — yo sin t, 

}? = sin ^ + yocosr 


26.10. Show that the vector field X = x 2 d x + xyd y has an integral curve through 
yo) given by 


x(t) 


xo 


1 — XQt 


y(t) 


yo 


1 — XQt 


26.11. Let X and Y be vector fields. Show that XoY — XoYis also a vector 
field, i.e.，it satisfies the derivation property. 

26.12* Prove the remaining parts of Proposition 26.4.13. 

26.13. Suppose that^ 1 are coordinate ftmctions on a subset of M and u; andX are 
a 1-form and a vector field there. Express a?(X) in terms of component functions 
of a; and X. 



810 26. ANALYSIS OFTENSORS 


26.14. Show that d o Lx = L\od. Hint: Use the definition of the Lie derivative 
for p-forms and the fact that J commutes with the pullback. 

26.15* Let M = R 3 and let / be a real-valued function. Let u) — aidx 1 be a 
one-form and ?7 = b\dx 2 A dx 3 + b^dx^ Adx 1 4 - b^dx 1 A dx 2 be a two-form on 
R 3 . Show that 

(a) df gives the gradient of /, 

(b) dr} gives the divergence of the vector B = (^i, 為 3 )， and that 

(c) ▽ x (▽/) = 0 and V - (V x A) = 0 are consequences of d 2 = 0. 

26.16. Show that i\ is an antiderivation with respect to the wedge product. 

26.17. Given that F = \F cl ^dx a A dx^, show that F A (*F) = [B| 2 — |E| 2 . 

26.18. Use Equation (26.40) to show that the zeroth component of the relativistic 
Lorentz force law gives the rate of change of energy due to the electric field, and 
that the magnetic field does not change the energy. 


26.19. Derive Equation (26.43). 

26.20. Given that F = ^F a ^dx a A dx 爲 ， write the two homogeneous Maxwell’s 
equations, V ■ B = 0 and ▽ x E + dB/dt = 0, in terms of F a ^. 

26.21. Write the equation 


in terms of E, B, and vector and scalar potentials. 


26«22. With F — ^F a ^dx a A dx 於 and J = J y dx y , show that d * F = 47r(*J) 
takes the following form in components: 


dF°^ 

dx 泛 


4丌八 


where indices are raised and lowered by diag (— 1 , — 1 , — 1 , 1 ). 

26.23. Interpret Theorem 26.5.14 for /? = 1 and p = 2on E 3 . 

26.24. Let / be a function on M?. Calculate d * df. 


26*25. Show that current conservation is an automatic consequence of Maxwell’s 
inhomogeneous equation ^ * F = 47T(*J). 


Additional Reading 

.1. Abraham, R. and Marsden, J. Foundations of Mechanics, 2nd ed” Addison- 
Wesley, 1985. The definitive textbook on classical mechanics presented in 
the language of differentiable manifolds. Contains a thorough treatment of 
exterior calculus and symplectic geometry. 


26-7 PROBLEMS 811 


2. Bishop, R. and Goldberg ， S. Tensor Analysis on Manifolds, Dover, 1980. 

3. Bott, R. and Tu, L. Differential Forms in Algebraic Topology, Springer ， 
Verlag, 1982. Despite its frightening title, the first chapter of this book is 
actually a very good introduction to tensors. 

4. Choquet-Bruhat, Y., DeWitt-Morette, C., and Dillard-BIeick, M. Analysis, 
Manifolds, and Physics, 2nd ed. ， North-Holland, 1982. A two-volume ref¬ 
erence written by mathematical physicists. Excellent for readers already 
familiar with the subject and seeking detailed applications in physics. 

5. Warner, F. Foundations of Differentiable Manifolds and Lie Groups, 
Springer-Verlag, 1983. A formal but readable introduction to differentiable 
manifolds. 




Part VIII__ 

Lie Groups and Their 
Applications 



















27 _ 

Lie Groups and Lie Algebras 


The theory of differential equations had flourished to such a level by the 1860s that 
a systematic study of their solutions became possible. Sophus Lie, a Norwegian 
mathematician, undertook such a study using the same tool that was developed by 
Galois and others to study algebraic equations: group theory. The groups associ¬ 
ated with the study of differential equations, now called Lie groups, unlike their 
algebraic counterparts, are uncountably infinite, and, as such, are both intricate 
and full of far-reaching structures. It was beyond the wildest dream of any 19th- 
century mathematician to imagine that a concept as abstract as Lie groups would 
someday find application in the study of the heart of matter. Yet, three of the four 
fundamental interactions are described by Lie groups，and the fourth one, gravity, 
is described in a language very akin to the other three. 

27.1 Lie Groups and Their Algebras 

Lie groups are infinite groups that have the extra property that their multiplication 
law is differentiable. We have seen that the natural setting for differentiation is the 
structure of a manifold. Thus，Lie groups must have manifold properties as well 
as group properties. 

Lie groups defined 27.1.1. Definition. A Lie group G is a differentiable manifold endowed with a 

group structure such that the group operation G x G G and the map G — G 
given by g g -1 are differentiable. If the dimension of the underlying manifold 
isr t we say that G is an r-parameter Lie group. 


Because of the dual nature of Lie groups, most of their mapping properties 
combine those of groups and manifolds. For instance, a Lie group homomor- 



B18 27. LIE GROUPS AND LIE ALGEBRAS 


GL{V) is a Lie group 


SL(V) is a Lie gro 叩 


phism is a group homomorphism that is also 6°°, and a Lie group isomorphism 
is a group isomorphism that is also a diffeomorphism. 

27J.2, Example. GL(V) is a Lie group 

As the paradigm of Lie groups, we consider GL(V), the set of invertible operators on an 
^-dimensional real vector space V， and show that it is indeed a Lie group. The set £(V) 
is a vector space of dimension (Proposition 25.1.1), and therefore, by Example 26.1.7, 
a manifold of the same dimension. The map det : £(V) — M is a C 00 map because 
the detenninant, when expressed in terms of a matrix, is a polynomial. In particular, it is 
continuous. Now note that 

GL(V) = det -1 (R-{0}) 

and that R — {0} is open. It follows that GL(V) is an open submanifold of £(V). Thus, 
GL(V) is an -dimensional manifold. Choosing a basis B for V and representing operators 
(points) A of GL(V) as matrices (a(j) in that basis provides a coordinate patch for GL(V). 

We denote this coordinate patch by [x lJ ], where x 1 ^ (A) = a". 

To show that GL(V) is a Lie group, we need to prove that if A, B 6 GL(V)» then 

AB : GL(V) x GL(V) GL(V) and A -1 : GL(V) GL(V) 

are C°° maps of manifolds. This is done by showing that the coordinate representations of 
these maps are C 00 . These representations are simply the matrix representations of operators. 
Since AB is a linear function of elements of the two matrices, it has derivatives of all orders. 
It follows that AB is C 00 . The case of A -1 is only slightly more complicated. We note that 

A -i = 

det A 

Thus, since det A is also a polynomial in a "，the /：th derivative of A— 1 is of the form 
G(a")/(det A ) 气 where Q is another polynomial. The fact that det A # 0 establishes the 
C°° property of A -1 , 

One can similarly show that if Vis a complex vector space, then GL(V) is a manifold 
of dimension 2n 2 . _ 

27.1.3. Example. SL(V) is a Lie group 

Recall that SL(V) is the subgroup of GL(V) whose elements have unit determinant. Since 
det : GL(V) — IR is C°°, Theorem 26.3.7 and the example after it show that SL(V) = 
der 1 (1) is a submanifold of GL(V) of dimension dim GL(V)—dim M = n 2 — 1. Since it is 
already a subgroup, we conclude that SL(V) is also a Lie group (Problem 27.5). Similarly, 
when V is a complex vector space, one can show that dim SL(V) = 2n 2 一 2 M 

27*1.4. Example. Other examples of Lie groups 
The reader may verify the following: 

(a) Any finite-dimensional vector space is a Lie group under vector addition. 

(b) The unit circle 5 1 , as a subset of nonzero multiplicative complex numbers is a Lie 
group under multiplication. 

(c) The product G x H of two Lie groups is itself a Lie group with the product manifold 
structure and the direct product group structure. 

(d) GL(n, E), the set of invertible nxn matrices, is a Lie group under matrix multiplication. 


P(aij) — a polynomial in aij . 


27.1 LIE GROUPS AND THEIR ALGEBRAS 817 


group of affine 
motions of U n 


local Lie groups 


local group of 
transformations 


(e) Let G = GL(n, R) x R n be the product manifold. Define the group operation by 
(A, u)(B,v) = (AB, Av + u). The reader may verify that this operation indeed defines a 
group structure on G. In. fact, G becomes a Lie group, called the group of affine motions 
of M w , for if we identify (A, u) with the affine motion 1 x Ax + u of M n , then the 
group operation in G is composition of affine motions. We shall study in some detail the 
Poincare group, a subgroup of the group of affine motions, in which the matrices are (pseudo) 
orthogonal. ■ 

In calculations, one translates all group operations to the corresponding op¬ 
erations of charts. This is particularly useful when the group multiplication can 
be defined only locally. One then speaks of an r-parameter local Lie group. To 
be precise, one considers a neighborhood U of the origin of W and defines an 
associative “multiplication” m : U x U —^ R r and an inversion i : Uq — U 
where Uo is a subset of U. We therefore write the multiplication as 

m(a, b) = c, a, b, c e 

where a = (a 1 , a 2 , …， a r ), etc. are coordinates of elements of G. The coordinates 
of the identity element of G are taken to be all zero. Thus, m(a, 0) = a and 
m(a, i (a)) = 0. In component forms, 

c k = m k (si i b), W(a ， 2 _ ⑻ ） =0 ， 灸 = 1 ， 2, •. • ， r. (27.1) 


The fact that C? is a manifold implies that all functions in Equation (27.1) are 
infinitely differentiable. 

27.1.5. Example. As an example of a local 1-parameter Lie group, consider the multi¬ 
plication rule m : U xU ->-R where Z7 = {x 6 E| |-t| < 1} and 

, 、 2xy — x — y TT 

y) = --- , x 9 y eu. 

xy — l 

The reader can check that m(x, (y, z)) = m((x 7 y) t z) f so that the multiplication is asso¬ 
ciative, Moreover, m(0, x) = m(x, 0) = x for allx e U, and i(x) = x/(2x — 1), defined 
for (/o = {a: e M| |^:| < 函 

27.1,1 Group Action 

As mentioned in our discussion of finite groups, the action of a group on a set is 
more easily conceived than abstract groups. In the case of Lie groups, the natural 
action is not on an arbitrary set, but on a manifold. 

27.1.6. Definition. Let M be a manifold. A local group of transformations acting 
on M is a (local) Lie group G, an (open) subset U with the property {e} x M C 
U CG x M, and a map : U M satining the following conditions: 


These consist of a linear transformation followed by a translation. 


818 27. LIE GROUPS AND LIE ALGEBRAS 


orbit; transitive 
action; stabilizer 


M 


Figure 27.1 For small regions of M，we may be able to include a large portion of G. 
However, if we want to include all of M, as we should, then only a small neighborhood of 
the identity maybe available. 

1. If(g, P) g U t (h, P)) e U f and (hg, P) e U f then 

屮 (/? ， ^(g 9 P)) = ^(hg, P). 

2. ^{e, P) = PforallP g M. 

3. If(g, P)sUjhen (g~K P)) e U and 屮 (g— 1 ， 审 ( 及， P)) = P. 

Normally, we shall denote 屯 (g ， 尸 ） by g - P, or gP. Then the conditions of 
the definition above take the simple form 

g-(h^P) = (gh) r P i g,hsG 9 PgM, 

e^P^P for all P € M, (27.2) 

g 一 1 ， (g • P) = P, g € G, PeM, 

whenever g * P is defined. Note that the word “local” refers to G and not M, i.e .， 
we may have to choose a very small neighborhood of the identity before all the 
elements of that neighborhood can act on all points of M (see Figure 27.1). 

All the properties of a group action described in Chapter 23 can be applied 
here as well. So, one talks about the orbit of G as the collection of points in M 
obtained from one another by the action of G; a transitive action of G on M when 
there is only one orbit; the stabilizer of a point of M, etc. The only extra condition 
one has to be aware of is that the group action is not defined for all elements of G, 
and that a sufficiently small neighborhood of the identity needs to be chosen. 

In the old literature, the group action is described in terms of coordinates. 
Although for calculations this is desirable, it can be very clumsy for formal dis¬ 
cussions, as we shall see later. Let a = (a 1 ， . • ■ ， a r ) be a coordinate system 







27.1 LIE GROUPS AND THEIR ALGEBRAS 819 


translations 


scale 

transformations 

one-dimensional 
projective group 


on G and x = (x 1 ， ••• ， ;c n ) a coordinate system on M. Then the group action 
^ : G x M -> M becomes a set of n functions described by 

x’ = 屯 (a ， x )， x"= 屮 (b, x’）= 屮 (m(b ， a),x), (27.3) 

where m is the multiplication law of the Lie group written in terms of coordinates 
as given in Equation (27.1). It is assumed that ^ is infinitely differentiable. 


27.1.7. Box. Equation (27.3) can be used to unravel the multiplication law 
for the Lie group when the latter is given in terms of transformations. 


27.1.8. Example, examples of groups of transformation 

(a) The two-dimensional rotation group acts on the 义 : y-plane as 

公 (0 ， r) = (x cos $ — y sin0, x sin0 + ycos0). ' 

If we write r' = r) and^ == 0(02,〆)，then a simple calculation shows that 

r f/ = (x cos(0i +O 2 ) — y sin (沒 1 + %)，x sm(0\ + ^ 2 ) + y cos (沒 1 + %))• 

With r 〃 = O 2 Y r), we recognize the “multiplication” law as m (沒 1 ，沒 2 ) = 

The orbits are circles centered at the origin. 

(b) Let M = a a fixed vector in and G = M. Define ^ : R x by 

^(t, x) = x + m, xeR n , t gR. 

This group action is globally defined. The orbits are straight lines parallel to a. The group 
is the set of translations in the direction amE n . The reader may verify that the “multipli- 
cation” law is addition of t’s. 

(c) Let G = R+ be the multiplicative group of nonzero positive real numbers. Fix real 

numbers a\, cl^ .of„, not all zero. Define the action of G on M w by 

屮 ( 入， x) 三入 • x = .• • ，入叫知)， 入 eM +， x = Oq，•. • ，知 ）e K rt . 

The orbits are obtained by choosing a point in M n and applying G to it for different X's. 
The result is a curve in E w . For example, if w = 2, cq = 1, and Qf 2 = 2, we get, as the orbit 
containing xq the curve 

入 _ xo = CU 。， 久 2 凡 ） => y = ^x 2 , 

4 

which is a parabola going through the origin and the point (jcq, ^o)- Note that the orbit 
containing the origin has only one point. This group is called the group of scale transfor¬ 
mations. The multiplication law is ordinary multiplication of (positive) real numbers. 

(d) Let G = E 4 act on M —M by 

0 (a, x )= 叫 + a = aifl 4 — 关 0 • 

a^x + 04 



27. LIEGROUPS AND LIE ALGEBRAS 


The reader may verify that this is indeed the action of a group (catch where the condition 
a\a 4 — 7 ^ 0 is used!), and if x f = 0(b, x) and x r, — <I>(a, x f ) 9 then 

// + a2b^)x +a\b2 + a 2 b^ 

{a^b\ + a^)x +«3 & 2 + fl 4 办 4 ’ 

so that the multiplication rule is 

m(a, b) = (a\bx +% 办 3 , + fl 2 办 4 , a 3 b l + « 4 ^ 3 » a 3^2 +^ 4 )- 

This group is called the one-dimensional projective group. M 


27.1.2 Lie Algebra of a Lie Group 

The group property of a Lie group G provides a natural diffeomorphism on G that 
determines a substantial part of its structure. 


left translation, 
left-invariant vector 
fields, left-invariant 
forms, and their 
^righr counterparts 


27.1.9. Definition. Let G be a Lie group and g e G. The left translation by g is 
a diffeomorphism L g : G ^ G defined by 

L g (h) = gh W h e G. 


A vector field 乞 on G is called left-invariant if for each g e G ， 乞 is L g -related to 
itself; Le., 2 


L s *o^ = ^oL g , or L g ^(h)) = Ugh) W g, he G. 

The set of left-invariant vector fields on G is denoted byg.A l-form whose pairing 
with a left-invariant vector field gives a constant function on G is called a left- 
invariant 1-form. 

The right translation by g, R g : G — G, and right-invariant vector fields 
and l-forms are defined similarly. 

The reader may easily check that right and left translations commute: 


R g o Lh — Lho R g W g,h e G (27.4) 

Moreover, if uj\ e is a 1-form on ^(G), then u e A 1 (G), given by co\ g = 

L*_ x u)\ e , is a left-invariant 1-form: 
s 

= = u?| e (X|g-ig) = u:\ e (\\ e ) 

independent of g. It is convenient to have a coordinate representation of L g ^. The 
coordinate representation ofL g is simply the multiplication law L g (h) = m(g, h), 

2 When there is no danger of confusion, we shall use ^(h) for f|/j. 



27.1 LIE GROUPS AND THEIR ALGEBRAS 821 


where we have used the same symbol for coordinates as for group elements. Equa¬ 
tion (26.11) can now be used to write the coordinate representation of L g ^: 

/Sh l dm^/dh 2 ... dm 1 /dh r ^ 
dm 2 /dh l dm 2 /dh 2 … dm 2 /dh r 

L g , ^ : . . (27.5) 

« « • 

■ ■ ■ 

\dm r /3h l Sm r /Sh 2 ... dm r /dh r / 

where all the derivatives in the matrix are evaluated at (g, h). 

We have already mentioned in Chapter 26 that X(G) is an infinite-dimensional 
Lie algebra under the Lie bracket “multiplication.” In general, no finite- 
dimensional subspace can be found that — in and of itself~is also an algebra. 3 
However, Lie groups are an exception: 

27.1.10* PropositioH. Let G be a Lie group and q the set of its left-invariant 
vector fields. Then 0 is a real vector space, and the map <j> •• Q — 7 e (G) t defined 
by<j>(^) = is a linear isomorphism. Therefore y dim g = dim KG) = dim G. 

Furthermore ， 込 is closed under Lie brackets; i.e, t q is a Lie algebra. 

Proof. It is clear that g is a real vector space. If = <t>(ji) for 乏， r; e g，then 

€(g) = L g ^($(e)) = L g ^(r}(e)) = ri(g) V g € G € = 巧- 

This shows that 0 is injective. To show that 0 is surjective, suppose that v e T e (G) 
and define the vector field ^onG by^(g) — L^(v) for all g € G. Then =v 
and because • 

Lg* o o Z//j*(v) = = 《 (gh) 三 在 (Lgh) = $ o Lg(h)• 

This proves the first part of the proposition. The second part follows immediately 
from the definition of a left-invariant vector field and Theorem 26.4.4. □ 

The flow of ^ at g e G can be shown to be 

Ft = S ex P(^0 = (27.6) 

Indeed, let be the vector field associated with this flow. The action of this vector 
field on a function / is 

X t l s (/)= ^(/(gexp ⑹) L. 

Therefore, 

(“*X 小) (/) = X||〆/ oL h )= ^(/oL“gexpg))L 

=-j- (f(hgtxpt$)) = ⑽ )(/) = [X! o Lh(g)](f). 

at / =o 


Such a subspace is called a subalgebra. 


822 27. LIE GROUPS AND LIE ALGEBRAS 


Since this is true for all / and g, andX^| c = ^(e), we conclude that X| is the 
unique left-invariant vector field corresponding to ^(e). 

The Lie algebra of a 27,1,11. Definition. The Lie algebra of the Lie group G is the Lie algebra 0 of 

Lie gro 叩 left-invariant vector fields on G. Sometimes we think of 乞 as a vector in 7 e (G). 

In that case，we denote by X 与 the left-invariant vector field whose value at the 
identity is 

The isomorphism of g with 7 e (G) induces a Lie bracket on ^(G) and turns 
it into a Lie algebra. In many cases of physical interest, it is this interpretation of 
the Lie algebra of G that is most useful. If two groups stand in some algebraic 
relation to one another, their Lie algebras will inherit such relations. More precisely, 
let G and H be Lie groups with Lie algebras q and respectively. Suppose 
<t> : G ^ H is 3 l Lie group homomorphism. Then identifying q with 0^(G) and 
f) with 7 e (H) 9 and using Theorem 26.4.4, we conclude that : g f) is a Lie 
algebra homomorphism, i.e., it preserves the Lie brackets: 

♦此 V] = [<l^t 細 ' V C r/ € 0 . (27.7) 

In particular, if 0 is a Lie group isomorphism, then 0* is a Lie algebra isomorphism. 

27.1.12. Example. Let V be a complex vector space with its general linear group GL(V), 
a 2w 2 -dimensional Lie group. Recall that GL(V) is an open submanifold of £(V). By 
Equation (26.4), 7 e (GL(V)) = T e (£(V)), where e is the unit operator. If we identify 
K^(V)) with <C(V) [see the box after Equation (26.4)], we may conclude that £(V), 
which we denote by ^[(V) in the present context, is the Lie algebra of GL(V) in which the 
Lie bracket is the commutator. We use the notation A(/) for a curve in GL(V) and A for the 
vector tangent to the curve. @ 


It is instructive to construct the coordinate representation of vector fields on 
GL(V). Let / : GL(V) — M be a function and A a vector field. Then, we have 

Mf) = - daij 9/ 


dt 

or, since / is arbitrary, 
ddfj 9 


dt dx ij ' 


A 


dA 


dt dx'j 


a ij 


dx i j dt 


( 0 , 


where summation over repeated indices is understood and we introduced dA/dtas 
an abbreviation fora* j(S/dx 1 ^). However, the one-to-one correspondence between 
matrices and operators makes this more than just an abbreviation. Indeed, we can 
interpret dA/dt as the derivative of A and perform such differentiation whenever 
it is possible. The equation above states that 


27.1.13. Box. To obtain the matrix elements (coordinates) of the operator 
A, one differentiates the t-dependent elements of the (matrix representation 
of the) operator A(t). 




Differential of the 
determinant map is 
the trace: det* = tr 


27.1 LIE GROUPS AND THEIR ALGEBRAS 823 


Of particular interest are the left invariant vector fields, or equivalently, the vec¬ 
tors belonging to T e (GL(V)). This amounts to substituting r = 0 in the formulas 
above. Thus, if A G 7 e (GL(V)), 

A = (27.8) 

For the product of two operators, we get 


AB= ^/(A(?)B(0) 


f=0 


d 

— (aifcbkj) 
at 


d 


：0 dx l J 


=(«a(0 ) 〜⑼ + aik(0) bkj(P)) 


dx l J 


(27.9) 


h] 如 


dx l J 


dx l J 




Many of the Lie groups used in physics are subgroups of GL(V). A characterization 
of the Lie algebras of these subgroups is essential for understanding the subgroups 
themselves and applying them to physical situations. These subgroups are typically 
defined in terms of maps 0 : GL(V) M for which M is a manifold and 
is surjective. To construct the Lie algebra of subgroups of GL(V), we need to 
concentrate on the map <p^ e as defined on T e (GL(V)). 

An important map is det : GL(V) C for a complex vector space V. We 
are interested in evaluating the map det* : 7 e (GZ,(V)) ^ Ti(C) in which we 
consider C = R 2 to be a manifold. For an operator A e T e (GL(V)) = 0 l(V) and 
a complex-valued function, we have 


det* (A)/ = /(det A(/)) 


dt 

d_ 

dt 


Re det A ⑴ 


r=o dt Sx dt dy 

- h — Imdet A(r) 

f=o dt 


/ =0 By 


— RetrA^- H-Imtr 

8 x ay 


where we used Equation (3.28). Since / is arbitrary and {d/dx, 3/dy] can be 
identified with { 1 ,«}, we have 


det* (A) = tr A. (27.10) 

27.1.14. Example. Lie algebra of SL(V) 

The special linear group SL(V) is characterized by the fact that all its elements have unit 
determinant. 


27.1.15, Box. The Lie algebra si(V) of the special linear group is the set of all 
traceless operators. 



824 27. LIE GROUPS AND LIE ALGEBRAS 


unitary group 


special unitary group 


orthogonal and 
special orthogonal 
groups 


exponential map of a 
Lie algebra 


This is because if we use (27.10) and (26.12) and the fact that SL(V) = det 一 1 (1)，we 
can conclude that det* (A) = tr A = 0 for all A € s 【 (V). _ 

27.1.16. Example. Lie algebras of unitary and related groups 
Let us first show that the set of unitary operators on V, denoted by U (V), is a Lie subgroup 
of GL(V), called the unitary group of V. Consider the map \jr : GL(V) H, where M 
is the set of Hermitian operators considered as a vector space (therefore, a manifold) over 
the reals ， defined by 少 (A) = AA' Using Equation (27.9), the reader may verify that 诊 * is 
suijective and 

^(A) = A + A 1 *. (27.11) 

It follows from Theorem 26.3.7 that U(V)= 少一 1(1) is a subgroup of GL(V). Using 
Equation (26.12), we conclude that ^(A) = A + = 0 for all A € u(V), i.e” 


27X17. Box. The Lie algebra u(V) of the unitary group is the set of all anti- 
hermitian operators. 


By counting the number of independent real parameters of a matrix representing a 
hennitian operator, we can conclude that dimM = n 2 . It follows from Theorem 26.3.7 that 
dim(7(V) = n 2 . 

The intersection of 5L(V) and U (V), denoted by SU(V), is called the special unitary 
group. Its Lie algebra su(V) consists of anti-hermitian traceless operators. The reader may 
check that dim SU (V) = n 2 — 1. When the vector space is € n , we write U (n) and SU(n) 
instead of U(C n ) and SU(C n ). 

If we restrict ourselves to real vector spaces, then unitary and special unitary groups 
become the orthogonal group 0(V) and special orthogonal group SO(V), respectively. 
Their algebras consist of antisymmetric and traceless antisymmetric operators, respectively. 
When V = R n 9 we use the notation 0(n) and SO(n). M 

Let X be a vector field on G. We know from our discussion of flows that X has 
a flow F t = exp(rX) at every point gofG with -e <t <€. Now, since F r (g) ^ g 
is in G, it follows from the group property of G that (F t ) n (g) = F nt (g) e G for 
all n. This shows that the flow of every vector field on a Lie group is defined for 
all f e E, i.e., all vector fields on a Lie group are complete. Now consider 0 as 
a vector space and manifold and define a map exp : ^ G that is simply the 
flow evaluated at? = L It can be shown that the following result holds ([Warn 83, 
pp. 103-104]): 

27.1.18. Theorem, exp : q ^ G, called the exponential map, is a diffeomor- 
phism of a neighborhood of the origin of g with a neighborhood of the identity 
element of G. 

This theorem states that in a neighborhood of the identity element, a Lie group, 
as a manifold, “looks like” its tangent space there. In particular, 




27.1 LIE GROUPS AND THEIR ALGEBRAS 825 


inner automorphism 
of a Lie group 


adjoint map of a Lie 
algebra 


27.1.19. Box, Two Lie groups that have identical Lie algebras are locally 
diffeomorphic. 


27.1.20. Example, why exp is called the exponential map 

Let V be a finite-dimensional vector space and A e gl(V). Define, as in Chapter 2, 


，/ A 


gt k A k 


ho 


k\ 


+ fA + 


and note that 




?=0 


A. 


Furthermore, 


e tA e sA 


t k A k s n A n 
oo / m t m—n V n 


t k s n 

kM 


vk-\-n 


A m = e ㈣) A . 




It follows that e tA has all the properties expected of the flow of the vector field A • 國 

The exponential map has some important properties that we shall have occasion 
to use later. The first of these properties is the content of the following proposition, 
whose proof is left as an exercise for the reader. 

27.1.21. Proposition. Let (p : H G be a Lie group homomorphism. Then，for 
all T} e\),we have 0 (exp tf 17 ) = exp G (0*r?). 

For every g e G, let I g = R~ l o L g . The reader may readily verify that 

which takes x e G to gxg^ 1 e G, is an isomorphism of G, i.e” I g (xy)= 
I g (x)I s (y) and I g is bijective. It is called the inner automorphism associated 
with g. 

27.1.22. Definition. The Lie algebra isomorphism 4 * = R~} o : 0 — 总 h 
denoted by Ad g and is called the adjoint map associated with g. 

Using Proposition 27.1.21, we have the following corollary. 

27.1.23. Corollary. &xp(Ad g O 二 f g exp 在 : =gexp^g- 1 for all ^ g g and g e 
G. 



27. LIE GROUPS AND LIE ALGEBRAS 


Let } be a basis for the (finite-dimensional) Lie algebra of the Lie group G. 
The Lie bracket of two basis vectors, being itself a left-invariant vector field，can 
be written as a linear combination of {^}: 

k=i 

On a general manifold, c\. will depend on the point at which the fields are be¬ 
ing evaluated. However, on Lie groups, they are independent of the point, as the 
following manipulation shows: 

[⑽， = [LMe), L g ^j(e)] - L g ^(e), $ ⑻] 

= 攸 k( e ) = • ⑻心 (g). 

k=l k=l k=l 

Lie’s second theorem Therefore, the value of at any point g e G is the same as its value at the identity, 

i.e., c\j is a constant. This statement is called Lie’s second theorem. 

27.1.24. Definition. Let }^ =1 be a basis for the Lie algebra g of the Lie group 
structure constants G. Then 
of a Lie algebra 

n 

⑸⑻， ^j(g)] = (27.12) 

k=l 

where c^j, which are independent of g t are called the structure constants ofG. 

The structure constants satisfy certain relations that are immediate conse¬ 
quences of the commutation relations. The antisymmetry of the Lie bracket and 
Lie’s third theorem the Jacobi identity lead directly to 

c K — - C K 

C p(T C KfZ c afM C Kp c 1lp c k(t = (27.13) 

The fact that {c^ p } obey Equation (27.13)is the content of Lie’s third theorem. 

27.1.3 Infinitesimal Action 

The action OrGxAf-^Mofa Lie group on a manifold M induces a homo¬ 
morphism of its algebra with X(M). If ^ e then exp(0) g G can act on M at 
a point P to produce a curve y(t) = exp(^)« P going through P. The tangent to 
this curve at P is defined to be the image of this homomorphism. 

infinitesimal 27,1.25. Definition. Let : G x M ^ M be an action. If t e Q，then 
generators of an ， 

action 



27.1 LIE GROUPS AND THEIR ALGEBRAS 827 


Infinitesimal 
generators of 
representations of G 
forma representation 
ofg. 


adjoint action 


^>(exp/£, P) is a flow on M. The corresponding vector field on M given by 

尸) ]I ❿ (exp 蠔，尸） 

at t=o 

is called the infinitesimal generator of the action induced by 乞 . 

In particular, 


27.1.26. Box. IfM happens to be a vector space，and the action a represen¬ 
tation as given in Box 24.1.3, then the infinitesimal generators constitute a 
representation of the Lie algebra of the group. 


27.1.27. Example. One can think of left translation on a Lie group G as an action of G 
on itself. Let 中 ： G x G — G be given by 4>(g, h) = L g (h). Then Definition 27.1.25 
gives 


心⑻ =^(exp^, g) 


t=0 




d 

?=o= -^(exp^) 




by the first equation in (26.24). It follows that is right-invariant. Indeed, 

° R h(§) = ^ g (^) = ( R gh)*^ = ( R h ° 〜)* 卜 Rk* 0 = R h^ 0 ^ g ( s )* 

Since this holds for all g e G, it follows that o Rf t = o ^ G , demonstrating that 

is right-invariant •圔 

The adjoint map of Definition 27.1.22 induces a natural action on the Lie 
algebra 0 with some important properties that we now explore. Define the adjoint 
action ^>:Gx^ - > 0 ofGong = %(G) by 4>(g, 0 = Ad g (£). We claim that 
the infinitesimal generator is aD^, where at>^(ty) = [^, r?]. In fact, 


d 

= — ^(exp/^r/) 

= 。 [exp 办⑻ 


d_ 

dt 


^^exp Ol) 


d_ 

Jt 




= 啤 ⑻’ 


(27.14) 


where we used Equation (27.6) as well as the definition of Lie derivative, Equation 
(26.31). If : GxAf — Mis an action, then : M M, defined by = 
尸)， is a diifeomorphism of M. Consequently, Og* : Tp(M) 7 g .p(M) is 
an isomorphism for every P G M whose inverse is = 中 g -i*. 

27.1.28. Proposition. Let ^ : G x M ^ M be an action. Then for every g e G 
and e Q, we have 

{Ad g i) M = and = - 



828 27. LIE GROUPS AND LIE ALGEBRAS 


comparison of 
coordinate 
manipulations with 
geometric 
(coordinate-free) 
analysis 


Proof. Let P be any point in M. Then， 


(Ad g OM(P) 


d 

=—^{cx^tAdg^, P) 

U t 


t=0 


^(g(exptOg~\ P) t 

at [t=0 

^<J>(g(expK)， 少及 —i( 尸 )） 

d 


dt 


o 0(exp 吃(尸 )) 


(by Definition 27.1.25) 

(by Corollary 27.1.23) 

(by definition of action) 
o 

(by definition of ^ g ) 


= ^l Vl(P ) ^(exp^, V l ^))| ?=0 时 ( 26 , 24 )] 
=[by (26.28)] 


The second part of the proposition follows by replacing g with exp t7], so that 

= ^expt7f*^M = ^exp^C-r?)*^- 

Differentiate both sides with respect to t and note that the LHS gives [?y， 在]财. The 
derivative of the RHS is the Lie derivative of with respect to M ，which is 


As mentioned earlier, a Lie group action is usually described in terms of the 
parameters of the group, which are simply coordinate functions on the group G, as 
well as coordinate functions on the manifold M. The infinitesimal generators, being 
vector fields on M, will then be expressed as a linear combination of coordinate 
frames. 

In the older literature, no mention of the manifold structure is made. A Lie group 
is defined in terms of multiplication functions and other functions that represent 
the action of the group on the manifold. Thus, an r-parameter Lie group G is a 
collection of two sets of functions, m p : W E, /> = 1,2,... ,r, representing 
the group multiplication, and (j) 1 : R r x R n M, / = 1 , 2 ,..., n, representing 
the action of G on the w-dimensional manifold M. We sketch the procedure below, 
leaving most of the calculations as exercises for the reader. As we develop the 
theory, the reader is urged to compare this “coordinate-dependent” procedure with 
the “geometric” procedure — which does not use coordinates — described so far. 

The action of the group is described by the coordinate transformations 4 

A — 勿(江 1 ， • • • ， a r ; xi，■ • ■ ， x n )， f = 1 , • • ■ ，《， 
x i = 0 KO, - .-, 0 ; x\ . x n ) 9 


4We use subscripts for coordinate functions here for typographical convenience. 


(27.15) 




27.1 LIE GROUPS AND THEIR ALGEBRAS 829 


as well as the group multiplication properties 

cp = ,..., cif', b \ ， ■ • _ ， ^v)， p = 1 ， •.. ，’， 

cip = ttip (0, ■ ■ ■ ， 0; ai，• • ■ ， fly) = fftpici iy. • • ， a”； 0， •••，（))， 

m p (a; m(b; c)) = m p (m(a; b); c). (27.16) 


Equation (27.15) is to be interpreted as a rule that takes the second set of arguments 
and transforms them via the first set into the LHS. Now suppose that we translate 
from x[ to a neighboring point x[ + dx[ via a set of group parameters 

We can also get to x\ + dx[ from Xi via a new set of parameters, 5 which have to 
be slightly different from say {a p + 叫?}二 =1 . We then have 

= (pt ， . ■ ■ ， Seif ;文1， •••， 工打）， 

x\ + dx[ = (pi (a\ + da\ ，•■•，々+ da r \ x\,, x n ), 
a p + da p =m p (ai,...,a r ;5ai, •_.，5a r ), (27.17) 

and, with summation over repeated indices understood. 


dx [= 
da\ = 


⑽ (a; x’) 


Sa K = Ui K (x f )8a K , 

a=0 


3m/(a; b) 

b=0 


Sa K = 0x K (d)8a K . 


(27.18) 


Inverting the second equation and substituting the resulting 5。 ’s in the first equation 
yields 


dx[ = u iK (x f )0~^(a)dax, or dx t = “化⑻仏 1 ⑻ da 入， 


where in the last equation, we changed the free coordinate variable on both sides. 
It then follows that 


3a 入 


r 

K=1 


(27.19) 


Lie's first theorem Equation (27.19) and establishing that Ui K is C 00 is the content of Lie’s first 
theorem. 

The change of an arbitrary function /(x) due to an infinitesimal transformation 
is 


= = ■^—u^(x)Sci/ C S£i K ^Uiic (x) ■心 ■■•) f* 


5 Here we are assuming that the action of the group is transitive, i.e.，that every point of the manifold can be connected to any 
other point via a transformation. 


27. LIE GROUPS AND LIE ALGEBRAS 


This suggests calling 


infinitesimal 
generators as vector 
fields on M 


n 台 

^ Uj K (x)^— 

i=l 1 


(27.20) 


the infinitesimal generators of the Lie group. The commutator of two of these 
generators is 


[Xp^XJ 


u 

dxi dxi 


(27.21) 


This commutator does not appear to be similar to the one in Definition 27.1.24, 
which is necessary if the generators are to form a Lie algebra. However, through 
汪 long and tortuous manipulation, outlined in Problem 27.9, one can show that 


(27.22) 

where c K pa are constants. 

One can also obtain this same result by the much simpler method of applying 
Proposition 27.1.28 to both sides of Equation (27.12): 




:二 l 




This equation is equivalent to (27.22) if we identify the X p ’s with the (& )m’s and 
ignore the irrelevant minus sign. 

The reader has hopefully been able to appreciate the power and elegance of 
the geometric approach to Lie groups and Lie algebras. The above illustration 
(Problem 27,9) brings out the tedium and the error-prone procedure of obtaining 
group-theoretic results through coordinate manipulations，a procedure used in the 
old literature including the work of Sophus Lie himself. Although such calculations 
are inevitable in practice, where most Lie groups are given in terms of parameters, 
they are not suitable for obtaining formal results. 

27,1.29. Example. The two-dimensional rotation group SO(2) is a 1-parameter Lie 
group defined by 

文 i ^ 2 ； — cos0 — X 2 sin0, 

x f 2 = 02 (^ 1^25 =xisin0 -}-^ 2 C0S ^» 

Using Equation (27.20), we find the (only) generator of this group: 

X = Ui where Uj = ^-j . 

dxi 1 30 10=0 

Explicitly, we have 


30 \ 9=0 


80 e=o 
902 

30 9=0 


(—x\ sin 0 — a ：2 cos 0 ) 10^0 = — 文 2 , 
tq cos 0 — X 2 sin^) 1^-0 = x\. 



27.1 LIE GROUPS AND THEIR ALGEBRAS 831 


and 

B 3 3 3 3 9 

X = «i—+«27 — =~y—+x—. 

bx\ dx2 3 工 1 0X2 

The reader recognizes this, within a factor of i, as the ^-component of the angular momentum 
operator. In general, 


27.1.30. Box. Angular momentum operators are the infinitesimal generators of ro¬ 
tation. 


Inclusion of the other two rotations about the x-axis and the y-axis completes the set 
of infinitesimal generators of the rotation group in three dimensions. ■ 

The action of a Lie group on M can be reconstructed from its infinitesimal 
action. The flow of X/f is the solution of the DE 

-T 1 = w l/c (x')， x-(0) =Xi. (27.23) 

dt 

Once the solution is obtained, one can replace t with a K for each k. In some 
applications, Ui K (x) will be given implicitly in terms of certain parameters of 
integration of some DEs [unrelated to (27.23)]. The solution of these DEs are 
typically generators of coordinate transformations that can be written linearly in 
terms of the parameters. To be more precise, suppose that after solving some DEs, 
we obtain 

… ， x n), (27.24) 

K=1 

where {q tf } are the parameters of integration, and Xi are components of the vec¬ 
tor field that generates the coordinate transformation. This means that for small 
parameters, one can write 

r 

x- = Xi + )( 文 1 ， ...,x n ) 

K=l 

and read off Ui K (x) = In that case, we have 

尝 =/,“’)«， …， 4)， = 私 (27.25) 

We shall have occasion to use this formula later. 



832 27. LIE GROUPS AND LJE ALGEBRAS 


Haar measure 


27.1.4 Integration on Lie Groups 

As any other manifold, one can define integration on Lie groups; i.e., one can 
construct nonvanishing n-forms and use Equation (26.46) to define integrals on a 
Lie group G. Because of the left-invariant property of objects on G, it would be 
helpful if the integration process were also left-invariant. For this to happen, the 
«-form would have to be left-invariant. It turns out that this can be accomplished 
more or less uniquely: 

27.1.31, Proposition. Let G be a Lie group of dimension n. Then there exists a 
left-invariant nonvanishing n-form {i that is unique up to a nonzero multiplicative 
constant. If G is compact，then is also right-invariant and the multiplicative 
constant can be chosen to be 1 . 

Proof. Let fj, e be any nonzero «-form on 7 e (G). The desired 打 -form is the left 
translation of this form, i.e” L^_ Y ft e . Indeed, let {Xi }^ =1 be left invariant. Then 

, X rt |^) = L*_ifx e (Ki\g, ..., X n [^) 

=Me ( 乙厂 I 茗 ， • • ■ ， Lg-l^^-n |g) 

= (Xi g,, X/jIg—i g ) ^ (XjIg, • • • 9 Ig). 

This shows that \i is left-invariant. Now note that any other n-form \i f e on 7 e (G) 
is a constant multiple of fi e . Therefore, the corresponding n-form /x’ g will be a 
constant multiple of ^i g . 

Let x e G and consider [i! = i?*//. We have 

= L* o = < o L；// = = 

where we used the fact that L g and R x commute and that is left invariant. The 
equation above shows that 〆 is also left-invariant. Therefore, n f = cfj,. If G is 
compact，we can integrate both sides and note that f G fji = f G p! because \j! is 
related to # by a change of variable. Therefore, c = 1 and R*fj, — /la. □ 

The left-invariant volume element (nonvanishing n-form) guaranteed by the 
proposition above is called Haar measure. Since all calculations are done using 
some coordinate system, we give an explicit expression of the Haar measure in 
terms of coordinates (parameters) of a general Lie group. Let y : =(/，■•_，/*) 
be the coordinates of the translation of x = (jc 1 , ..., jc r ) by g G G. Then we can 
write y = m(g ， x )， so thotdy^ = (dy J /dx t )dx i = /dx l )dx l . Therefore, 

dy 1 A^^Ady r =dct ( dmJ(g L x) 

V dx l 

In particular, if x = 0, the coordinates of the identity, then y will be the coordinates 
of g. So, the volume element at g, denoted by d r y, will be given by 

\ dx l ) x=0 


^ dx l A … A dx r . 




27.2 AN OUTLINE OF LIE ALGEBRA THEORY 833 


density functions 
associated with Haar 
measure 


Lie algebra defined 


Note that this is consistent with the geometric definition of the invariant measure 
given in Proposition 27.1.31 because L g -\^ = L~l and the matrix of L* is the 
inverse of the matrix of The volume element at g, which is invariant on 
G — and therefore has the same value as at the identity — and which we denote by 
d^(g), will be given by 


dfi(g) = dfi(e) ^ d r x = det" 


3m j (g,x)\ 
dx l ~ ) 


d r g, 


(27.26) 


where we have replaced y with the more suggestive g. The volume element d r g is 
the ordinary Euclidean volume element of R r evaluated at the parameters corre¬ 
sponding to g. The quantity multiplying d r g is called the density function. Note 
that since we are interested in the derivatives of at small values of x ， we can 
take the components of x to be small, and retain them only up to the first order. 
This will sometimes simplify the calculation of the invariant Haar measure. 

27.1.32. Example. From the multiplication rule for the one-dimensional projective group 
given in Example 27.1.8, we easily find 


det 


dm 

~db 


：) 


det 


fai 

0 

a 2 


0 

ai 

0 


^3 

0 


0 

VO 

a 3 

0 

a/\) 


{aya^ - a 2 a^) 2 . 


Thus the density function is {a\a^ — fl2 fl 3) 2 ,the invariant Haar measure is 
= (aj«4 — a2Cii)~^d^a. 


麵 


27.2 An Outline of Lie Algebra Theory 

The notion of a Lie algebra has appeared on a number of occasions both in our 
study of vector fields on manifolds and, more recently, in the study of Lie groups 
in the vicinity of their identity elements. Lie algebras play an important role in 
the representation theory of Lie groups as well. It is therefore worth our effort to 
spend some time getting acquainted with the formal structure and properties of 
these algebras. We shall restrict our discussion to finite-dimensional Lie algebras. 

27.2.1. Definition. A finite-dimensional vector space V over M (or CJ is called a 
Lie algebra overM^orC) if there is a binary operation，called Lie multiplication ， 
:V xV V onV r satisfying 

1. [X, Y] = — [Y ， X\ for all X, Y G V (antisymmetry). 

2. [aX + j0Y, Z] = cy[X, Z] H- Z]for R(orC) (linearity). 

3. [X, [Y, Z]] + [Z, [X, ¥]] + [Y, [Z, X]] = 0 (Jacobi identity). 


834 27. UEGROUPS AND LIE ALGEBRAS 


subalgebra, ideal, 
and center of a Lie 
algebra 


Knowing the 
structure constants, 
one can reconstruct 
the Lie aigebra! 


The concepts of a homomorphism, its kernel，its range ， etc. are the same as before. 

To distinguish Lie algebras from vector spaces, we shall denote the former by 
lowercase German letters as we have done for the Lie algebras of Lie groups. 

27.2.2. Example* Recall from Chapter 1 that an algebra is a vector space with a product. 

If this product is associative, then one can construct a Lie algebra out of the associative 
algebra by defining [a, b ] 三 ab — ba. In particular, the matrix algebra under commutation 
of matrices becomes a Lie algebra, which we denote by g[(n, M) [or C)]. 國 

27.2.3. Definition. Let vbea Lie algebra, A subspace u ofX) is called a subalgebra 
if[X, Y] g u whenever X, Y G u. The subspace u is called an ideal if[X,Y] € u 
whenever either X e uorY e u'The center i of^ is the collection of all X € D 
whose Lie multiplication with all vectors of t) vanishes. A Lie algebra is abelian ， 
or commutative ， ifi = t). 

If we choose 泛 basis in the Lie algebra t), and express the Lie multiplication 
of basis vectors as a linear combination of basis vectors, we end up with basis- 
dependent structure constants that satisfy Equation (27.13). The structure constants 
completely determine the Lie algebra: Given these constants, one can choose a 
vector space V of correct dimension, a basis in that space, and impose the Lie 
multiplication law among the basis vectors suggested by the structure constants. 
Once the Lie multiplication law for basis vectors is established, the law for arbitrary 
vectors follows from linearity of Lie multiplication. This procedure induces a 
binary operation on V and turns it into a Lie algebra t). Any other algebra so 
constructed will be isomorphic to t). 

27.2.4. Example. We can classify all two-dimensional Lie algebras by analyzing their 
structure constants. Let X! and X 2 be any two linearly independent vectors of the two- 
dimensional Lie algebra t). Write the only nonzero Lie bracket as 

[X 1 ,X 2 ]=c 1 Xi+c 2 X 2 . 

There are two cases to consider: Either c\ = 0 = C 2 or one of the constants is nonzero. The 
first case corresponds to a 2，dimensional abelian Lie algebra: 

[X/,X ; ] = 0 for i, j = 1,2. 

For the second case, suppose c\ ^0 and define the vectors 

X = ciXi 4 - C 2 X 2 , Y = X 2 /C 1 . 

Then the nonzero Lie bracket becomes [X ， Y] = X. 豳 

The result of Example 27.2.4 is summarized as follows: 


27.2.5. Box. There are only two 2-dimensional Lie algebras given by either 
one of the following nonzero Lie bracket relations: 

[Xi,X 2 ]=0 or [X!,X 2 ] = Xi. 





27.2 AN OUTLINE OF LIE ALGEBRA THEORY 835 


Weyl basis for 
gi(n f R) 


27*2.6. Example. The Pauli spin matrices 



= 





form a Lie algebra under the commutation relation given by 


[cr^cr*] = 2Ujki(ri. 

Thus, c l j k = 2i€j^i ，Pauli spin matrices are a basis for mi(2). 國 

27.2.7. Example. The Lie group GL(n,M) has ^i(n, R), the set of all real nxn matrices, 
as its Lie algebra. The standard basis of this Lie algebra, also called the Weyl basis, consists 
of matrices j that have zeros everywhere except at the i jth position. We therefore have 

( e ij)ki = & ik 8 jh (27.27) 

We can readily find the Lie multiplication (commutation relations) for these matrices. We 
simply need to look at the elements of the matrix of the commutator: 

([^ ， e k i]) mn = (e i7 e fc/ ) 削一 (e k ieij) mn 

=( e ") mr ( e kl)rn - ( e kl)mr ( e ij\n 

― 谷 im 谷 jr 各 kr 谷 In — ^km^lr^ir^jn ^ 各 jlc 谷 In — ^km^li^jn 

= ( e il)mn 8 jk- ( e 心 ) 

or 

[e/^efe/] = - 5//%’ (27.28) 

The structure constants, which are naturally double-indexed, can be read off from Equation 
(27.28): 

c^ kl = S jk 8^8f - S 碑 (27.29) 

where we have used a superscript for some of the Kronecker deltas to conform to the position 
of the corresponding index on the LHS. 隧 

27.2.S. Example. An important datum is the dimension of the Lie group (or its associated 
Lie algebra, since they are the same). This datum is not apparent in most cases of interest in 
which the group is defined in terms of some geometric property. For example, the symplectic 
group is defined as all linear transformations A that leave a certain antisymmetric bilinear 
form invariant (Example 23.2.2). In terms of matrices, we have 


x /f Jx ; =x r Jx x ( A f JAx = x l Jx 


Vxg R 2tt , 



It follows that the symplectic group consists of all matrices A such that 


A r JA = J. 


(27.30) 



836 27. LIE GROUPS AND LIE ALGEBRAS 


If we write A in block form, 

a= ( a U a 12 、 

-U 21 A 22 J ， 

where A" are n x n matrices, then, Equation (27.30) becomes 

( A u 气 iv 0 V Al1 Al 2 W 0 八 

U( 2 A T 22 ) 1-1 o) VA 21 A 2 2 / - \-1 0/ * 

or 

A 11 A 21 = A^An, A^ 2 Ai2 = 八《 2 八 22 ，- A 21 A 12 = ^ (27.31) 

For the symplectic algebra 0p(2«, R), we are interested in the matrix A when it is close 
to the identity. This means that 

An = 1 +eXn, A22 = 1 +6X22, A12 = 6X12 ， A21 =eX2i. 


Substituting these in Equation (27.31) and keeping terms linear in e，we obtain the following 
relations among X^-: 

^22 ^ - Xll ， Xj2 ^ = X 21 . (27.32) 

It follows that we need n 2 parameters to describe the n x w matrices Xn and X 22 simultane¬ 
ously. For the symmetric matrices X 12 and X 21 ， we need n(K + 1)/2 independent parameters 
each. Therefore, the total number of independent parameters needed for (or the dimension 
of) the symplectic algebra sp(2n, E) is 

n 2 +2 — : 1 ) = n(2n + 1). 圈 

Although our attempt is to give a formal discussion of the Lie algebras and their 
structure in this section, we shall do this with an eye to the eventual utility of this 
discussion in a better understanding of the Lie algebras of Lie groups. To make the 
connection between the present formalism and the Lie algebras arising from Lie 
groups, we shall make heavy use of matrix groups, i.e” GL(n, R) [or GL{n, C)] 
and its subgroups. Equation (27.8) gives a method of finding the matrices of the 
algebra if those of the group are known: 


27.2.9. Box. Differentiate the matrix with respect to a parameter at the 
identity (where all parameters are set equal to zero) to find the matrix “in 
the direction” of that parameter. 





27.2 AW OUTLINE OF LIE ALGEBRA THEORY 837 


27.2.1 The Lie Algebras o(p,« -p) and pQf 9 n —p) 

Many of the Lie groups encountered in physical applications are special cases 
of the (pseudo) orthogonal group 0(p, n — p) and its associated Poincare group 
P(p f n — p). It is therefore worthwhile to study their Lie algebras in some detail. 
Introduce the diagonal matrix 

'H = _ 1， _ 1’ •-•， _ 1，1，1， •••，！） 



and note that the (pseudo) orthogonal group 0(p,n — p) consists oinxn matrices 
that leave the bilinear form x ■ x ^ x { 7]x invariant for x € R n . This means that the 
matrices A will have to satisfy 

A 1 r)A = r/ => (detA) 2 = 1. (27.33) 

打 -orthogonal Such matrices are called 77 -orthogonal. The fact that 0(p,n — p) is a group and 
matrices that ry -1 = r) can be used to show that 

A? 7 A r = 77 . (27.34) 

27.2.10. Example. The Lorentz group 

The group of the special theory of relativity is the full Lorentz group 0(3, 1). This is the 
group of transformations that leave the invariant length 6 

rj^XiXj = —Xy — ^ Xq — — X2 — 

of a 4-vector (xi i X 2 ， x^ i XQ = ct) invariant. The (0,0)-components of Equations (27.33) 
and (27.34) yield 

^00 - a \0 - 4) - 4) = 

fl Q 0 — a 01 ^ a 02 ^ a 03 ^ (27.35) 

Either one of these equations implies that aoo > 1 or aoo £ — 1. Lorentz transfomiations 
forwhichaoo 匕 1 are called orthochronous. Since detl = H-l andloo = +1, the identity 
belongs to the subset consisting of transformations with detA = +1 and «oo > 1. Such 
transformations form a subgroup of 0(3,1) called the proper orthochronous Lorentz 
transformations，and have the property that they can be reached continuously from the 
identity. 

Depending on whether x • x > 0, x • x < 0, or x • x = 0, the vector x is called tlmelike, 
spacelike, or null ， respectively. In the special theory of relativity M 4 becomes the set of 
events. At every event x the set 趿 4 is divided into 5 regions: 

1 • All events y = Cyi ，乃， 73 ， yo) which one can go from x by material objects, with 
speed less than c, lie to the future of x, i.e., — ^0 > and are timelike: 

(: vo — ^o) 2 > (y\ 一尤 1) 2 + to - xi) 2 + to - x 3 ) 2 . 

They form a 4-dimensional subset of M 4 and are said to lie inside the future light 
cone. 

6 It is common to label the time coordinate with index 0 rather than 4. We shall use this convention. 


orthochronous and 
proper 
orthochronous 
Lorentz 
transformations 

Timelike, spacelike, 
and null vectors; E 4 
as the set of events 


future light cone 



838 27. LIEGROUPS AND LIE ALGEBRAS 


past light cone 


elsewhere 


2. All events y = (^, y 3 , yo) to which one can go from x only by a light signal lie 

to the future of x, i.e” yo — xq > 0, and 

(yo - xo) 2 - (y\ - x\) 2 - fe - X2) 2 - (3^3 - ^3) 2 = 0- 

They forma 3-dimensional subset of M 4 and are said to lie on the future light cone. 

3. All events y = (yi, 乃， 夕 3, ^o) from which one can come to x by material objects, 
with speed less than c, lie in the past of x, i.e., 文 0 — 州 > 0, and are timelike: 

(xo - yo) 2 > — yi) 2 + - yi) 2 + (^3 - b) 2 - 

They form a 4-dimensional subset of M 4 and are said to lie inside the past light 
cone. 

4. All events y = (^i, ^ 3 , : Vo) from which one can come from x only by a light 

signal lie to the past of x ， i.e., xo — 70 > 0, and 

(^o - yo) 2 - (^i - ^i) 2 - (x 2 - y 2 ) 2 - (^3 - yi) 2 = o. 

They form a 3-dimensional subset of M 4 and are said to lie on the past light cone. 

5. All events in the remaining part of M 4 form a 4-dimensional subset，are spacelike, 
and cannot be connected to x by any means. They are said to belong to elsewhere. 

From a physical standpoint, future and past are observer-independent. Therefore, if y 
lies in or on the future light cone of x with respect to one observer, it should also do so 
with respect to all observers. Since observers are connected by Lorentz transformations, we 
expect the latter to preserve this relation between x and y. Not all elements of 0(3,1) have 
this property. However, the proper orthochronous transformations do. The details are left 
as a problem for the reader (see Problem 27.14). 囫 

As a prototype of 77 -orthogonal matrices, consider the matrix obtained from 
the unit matrix by removing the iiih, ijth, jith, and //th elements，and replacing 
them by an overall 2x2 matrix. The result, denoted byA (⑺， will look like 


A ⑼ = 


(1 

0 ... 

0 ... 

0 . 

.. 0\ 

0 

9 

1 ... 

• 

0 ... 

■ 

0 . 

■ 

0 

V 

* 

• 

0 

1 

■ 

0 ... 

• 

■ 

V 

G/z " « 

• 

■ 

■ 

* 

* 

.. 0 

• 

■ 

■ 

0 

• 

• 

» 

0 … 

■ 

d jl rn m m 

急 

■ 

ft 

a jj ■ 

* 

• 

• 

.. 0 

• 

€ 

• 

\o 

■ 

0 ... 

A 

■ 

0 ... 

■ 

• 

0 . 

• 

■ 

_■ 1 / 


This matrix will transform (x\,,x n ) e W 1 according to 
x[ = auXi + aijXj, (no summation!) 

= apXi cijjXj ， 

x f k — Xk for k ^ /, j. 




27.2 AN OUTLINE OF LI E ALGEBRA THEORY 839 


In order for A ⑴） to leave the bilinear form x ? r/x invariant, the 2 x 2 submatrix 
(ali a-j) must be either a rotation (corresponding to the case where z, j < p or 
i，j > p\ or a Lorentz boost 7 (corresponding to the case where i < p and j > p). 
In the first case, we have 

(an atj \ /cosO — sin0\ 

\aji a") 一 \sinO cos 0 / 7 

and in the second case 

/an a "、 一 / cosh 备 一 sinh 专、 

\aji ajj) ~ \— sinh^ cosh! 7 ’ 

where § = tanh _1 (u/c) is the “rapidity.” 

The matrices of the algebra are obtained by differentiation eitO = 0(or$ = 0). 
Denoting these matrices by M"，we readily find that for the case of rotations, M" 
has -1 at the ijih position, +1 at the jith position, and 0 everywhere else. For 
the case of boosts, has —1 at the i jth and the jith position, and 0 everywhere 
else. Both cases can be described by the single relation 

- 叩占 f ， - 

It is convenient to have all indices in the lower position. So, we multiply both sides 
by rj m k to obtain 


= 爪师 - rjjim, Mij = 


(27.36) 


Lie brackets for the 
algebra o(p, n-p) 


Poincare group as 
matrices 


We can use Equation (27.36) to find the Lie multiplication (in this case, matrix 
commutation relations) for the algebra o(p, n — p): 

[My ， Mw] = rjik^Aji - mi^jk + TjjiUik - (27.37) 

The Lie group 0(p, n — p) includes rotations and Lorentz transformation. 
Another group with considerable significance in physics is the Poincare group 
P(p, n — p), which includes translations 8 in space and time as well. An element 
of P(p 9 n — p) transforms x g W l to x’ = Ax + u，where u is a column vector 
representing the translation part of the group. It is convenient to introduce matrices 
to represent these group operations. This is possible if we represent an element of 
as an (« + 1)-column whose last element is an insignificant 1. then, the reader 
may check that a Poincare transformation can be written as 



(27.38) 


^The elementary Lorentz transformations involving only one space dimension. 

8 One can think of the Poincare group as a subgroup of the group of affine motions in which the matrices belong to 0(p,n — p) 
rather than GL(n, IR). 



840 27. LIE GROUPS AND LIE ALGEBRAS 


where A is the n x « matrix of 0(p,n — p )， and u is an w-dimensional column 
vector. 

The Lie algebra of the Poincare group is obtained by differentiating the (n + 
1) x (n + 1) matrix of Equation (27.38). The differentiation of the matrix A will 
give o(p, n — p)of Equation (27.37). The translation part will lead to matrices Pf 
with matrix elements given by 

(PiA = ^f +1 => (P ( ) kl = (27.39) 

These matrices satisfy the following Lie multiplication rules: 

[Pi, Py] = o, [Mfj ， Pjt] = mkPj - rjjkPi. 

It then follows that the full Poincare algebra p(p y n — p) is described by the 

Lie b rackets for the following Lie brackets: 

Poincare algebra 

P(P> n ~P) [Miy, M^] = m^ji - m^jk + rjjMk — Vjk^ih 

[M" ， P*] = mPj — rjjkPi， 

[PnPj] = 0. (27.40) 

27.2.2 Operations on Lie Algebras 

27.2.11, Definition. Let t? be a Lie algebra. A linear operator D : 0 ^ d satisfying 


D[X ， Y] = [DX ， Y] + [X ， DY] 


is called a derivation ofx>. 


derivation algebra of 
a Lie algebra 


adjoint algebra of a 
Lie algebra 


Illustration of 
homomorphism of 
su(2) and its adjoint 
using Pauli spin 
matrices 


Although the product of two derivations is not a derivation, their commutator 
is. Therefore, the set of derivations of a Lie algebra d themselves forma Lie algebra 
S)tj under commutations, which is called the derivation algebra. 

Recall that the infinitesimal generators of the adjoint action of a Lie group on 
its algebra were given by [Equation (27.14)]. We can apply this to a general 
Lie algebra 0 by fixing a vector X € V and defining the map adx : i) t) given 
by adx(Y) = [X, Y]. The reader may verify that adx is a derivation of V and that 
ad[x,Yj = [adx, ady]. Therefore, the set a^tr = {adx | X e c} is a Lie algebra, a 
subalgebra of the derivation algebra of d, and is called the adjoint algebra of 
t). There is a natural homomorphism 中 ：D — at>t) given by 诊 (X) = adx whose 
kernel is the center of t). Furthermore, at) 0 is an ideal of 

27.2.12« Example. We construct the matrix representation of the operators in the adjoint 
algebra of5u(2) with Pauli spin matrices as a basis. From 

adfyj (ct\) = [o*i ， cti] =0 

we conclude that the first column of the matrix of ad ffl is zero. From 


ad<y, ( 巧 ） =[oi ， a 2 ] = 2 /ct3 




27.2 AN OUTLINE OF LIE ALGEBRA THEORY 841 


The reader may readily verify that [ad 〜， ad^] = 2i€jki ad<^. 國 

If is an automorphism of t), i.e., an isomorphism of t) onto itself, then 

ad^(x> = \/r o adx o 诊 _1 V X € d. (27.41) 

Since adx is an operator on the vector space t), one can define the trace of adx. 
However, the notion of trace attains a far greater significance when it is combined 
with the notion of composition of operators. For X, Y € t), define 

(X|Y) = tr(ad x oad Y ). (27.42) 

Then one can show that (• | •) is bilinear and symmetric. It becomes an inner product 
if the “vectors” of the Lie algebra are hermitian operators on some vector space 
V, or if the underlying vector space is over IR (see Proposition 3.6.6 in Chapter 3). 
Furthermore ， （ . 丨 ，） satisfies 

([X ， Y] IZ) + ([X ， Z]|Y)=0. (27.43) 

Killing form of a Lie 27.2.13. Definition. The symmetric bilinear form (-|-) : t? x o ^ t) defined by 
algebra (27.42) is called the Killing form of Xi. 

It is an immediate consequence of this definition and Equation (27.41) that the 
Killing form of t) is invariant under all automorphisms of t3. 


Most mathematicians seem to have little or no interest in history, so that often the name 
attached to a key result is that of the follow-up person who exploited an idea or theorem 
rather than its originator (the Jordan form is due to Weierstrass, Wedderburn theory to Cartan 
and Molien). No one has suffered from this ahistoricism more than Killing. For example, the 
so-called “Cartan sub-algebra” and “Cartan matrix” were defined and exploited by Killing. 


we conclude that the second column of the matrix of ad CTl has zeros for the first two elements 
and for the last. Similarly, from 

ad^ (<J 3 ) = [cq，(73] = -2/(X2 

we conclude that the third column of the matrix of ad^ has zeros for the first and third 
elements and —2i for the second. Thus, the matrix representation of ad^j is 

/O 0 0 \ 

ad^ = (0 0 -2i ). 

\0 2i 0 / 

Likewise, we can obtain the other two matrix representations; they are 




2/0 0 


o o o 







842 27. LIE GROUPS AND LIE ALGEBRAS 


He exhibited the characteristic equation of an arbitrary element of the Weyl group when 
Weyl was 3 years old and listed the orders of the Coxeter transformation 19 years before 
Coxeter was bom! 

Wilhelm Karl Joseph Killing (1847-1923) began university study in Munster in 1865 
but quickly moved to Berlin and came under the influence of Kummer and Weierstrass. 
From 1868 to 1882 much of Killing’s energy was devoted to teaching at the gymnasium 
level in Berlin and Brilon (south of Munster). At one stage, when Weierstrass was urging 
him to write up his research on space structures, he was spending as much as 36 hours per 
week in the classroom or tutoring. (Now many mathematicians consider 6 hours a week an 
intolerable burden!) On the recommendation of Weierstrass, Killing was appointed professor 
of mathematics at the Lyzeum Hosianum in Braunsberg, in East Prussia (now Braniewo in 
the region of Olsztyn, in Poland). This was a college founded in 1565 by Bishop Stanislaus 
Hosius, whose treatise on the Christian faith ran to 39 editions! The main object of the college 
was the training of Roman Catholic clergy, so Killing had to teach a wide range of topics, 
including the reconciliation of faith and science. Although he was isolated mathematically 
during his ten years in Braunsberg, this was the most creative period in hjs mathematical 
life. Killing produced his brilliant work despite worries about the health of his wife and 
seven children, demanding administrative duties as rector of the college and as a member 
and chairman of the City Council, and his active role in the Church of St. Catherine. 


was quickly submerged in teaching, administration, and charitable activities. He was Rec¬ 
tor Magnificus for some period and president of the St. \^ncent de Paul charitable society 
for ten years. Killing’s work was neglected partly because he was a modest man with high 
standards; he vastly underrated his own achievement. His interest was geometry, and for this 
he needed all real Lie algebras. To obtain merely the simple Lie algebras over the complex 
numbers did not appear to him to be very significant. Another reason was due to Lie, who 
was quite negative about Killing’s work. At the top of page 770 of a three-volume joint work 
of Lie and Engel we find the following less than generous comment about Killing: “With 
the exception of the preceding unproved theorem . “ all the theorems that are correct are 
due to Lie and all the false ones are due to Killing!” 

Killing was conservative in his political views and vigorously opposed the attempt to 
reform the examination requirements for graduate students at the University of Mtinster by 
deleting the compulsory study of philosophy. Engel comments “Killing could not see that 
for most candidates the test in philosophy was completely worthless.” He had a profound 
patriotic love of his country, so that his last years (1918-1923) were deeply pained by the 
collapse of social cohesion in Germany after the War of 1914—18. 



What we now call Lie algebras were invented by the Norwe¬ 
gian mathematician Sophus Lie about 1870 and independently 
by Killing about 1880. Lie was seeking to develop an approach 
to the solution of differential equations analogous to the Galois 
theory of algebraic equations. Killing’s consuming passion was 
non-Euclidean geometries and their generalizations，so lie was 
led to the problem of classifying infinitesiinal motions of a rigid 
body in any type of space (or Raumformen, as he called them). 

In 1892 he was called back to his native Westphalia as pro¬ 
fessor of mathematics at the University of Munster, where he 



27.2 AN OUTLINE OF LIE ALGEBRA THEORY 843 


compact Lie algebra 


Cartan metric tensor 
of a Lie algebra 


(Taken from A. J. Coleman, “The Greatest Mathematical Paper of All Times，” Mathematical 
Intelligencer 11 (3) (1989) 29-38.) 


As noted above, the Killing form is an inner product if the Lie algebra consists 
of hermitian operators. This will certainly happen if the Lie algebra is that of a 
group whose elements are unitary operators on some vector space V. We shall see 
shortly that such unitary operators are not only possible, but have extremely useful 
properties in the representation of compact Lie groups. A unitary representation 
of a Lie group induces a representation of its Lie algebra whose “vectors” are 
hermitian operators. Then the Killing form becomes an inner product. The natural 
existence of such Killing forms for the representation of a compact Lie group 
motivates the following: 

27.2.14. Definition. A Lie algebra D is compact if it has an inner product (+) 
satisfying 

([X ， Y] IZ) + ([X ， Z] IY) = 0_ 

Choose a basis {Xj} for the Lie algebra t> and note that (adx,)) = • Therefore, 

(Xi |Xj) = tr(adx f o adx)) = (ad Xi )f (adx^)^ - c k n c l jk = g t j, (27.44) 

where 沿 y are components of the so-called Cartan metric tensor in the basis {X/}. 
If A, B e 0 have components {a 1 } and {b l } in the basis {X r }, then it follows from 
Equation (27.44) that 

(A|B) = aW 恥， (27.45) 

as expected of a symmetric bilinear form. We can use the Cartan metric to lower 
the upper index of the structure constants: Cijk = c l "gik. By virtue of Equation 
(27.44), the new constants maybe written in the form 

c ijk= 4jC r ls c s kr = {-c l js c r u - 办 ( 27 - 13 ) 

= C l js c il c kr + c si c lj c rk- 

The reader may now verify that the RHS is completely antisymmetric ini,j, and 
k. If the Lie algebra is compact, then one can choose an orthonormal basis in which 
glk = (because the inner product is, by definition, positive definite) and obtain 
c^j = Cijk. We therefore have the following result. 

27.2.15. Proposition. Let >Qbe a compact Lie algebra. Then there exists a basis 
of\y in which the structure constants are represented by a third-order completely 
antisymmetric covariant tensor. 



844 27. LIEGROUPS AND LIE ALGEBRAS 


27.2.16. Example. We can calculate explicitly the Killing form of the Lie algebras 
gl( ； 2 , E) and sl(«, M). Choose the Weyl basis introduced in Example 27.2.7 and expand 

• * w w 

A, B € 0 【 (n, R) in terms of the Weyl basis vectors: A = a lJ eij，B = b lJ e/^. The Cartan 
metric tensor becomes 

Sij,kl = c ^j y mn c ， kl!rs = ~ ^Ir^k^s ~ > 

where we have used Equation (27.29). It follows from these relations. Equation (27.45), 
and a simple index manipulation that 

(A IB) = ci l ^b kl gij^i = 2n tr(AB) — 2tr Atr B (27.46) 

for A, B e gl(w, M), and 

(A|B) = 2ntr(AB) (27.47) 

for A, B e 5 l( 7 i ， M) t because all matrices in sl(w, E) are traceless. 驪 

A Lie algebra t), as a vector space, may be written as a direct sum of its 
subspaces. We express this as 


t)=Ul+U2^ - j-U r = +UA；. 

If in addition {u^} are Lie subalgebras every one of which commutes with the rest, 



tJ = Ui ㊉ 112 ㊉…㊉ ©U 灸 

k=l 


(27.48) 


and say that d has been decomposed into a direct sum of Lie algebras. In this case, 
each Uk is not only a subalgebra, but also an ideal of U as the reader may verify. 

The study of the structure of Lie algebras boils down to the study of the “sim¬ 
plest” kind of Lie algebras in terms of which other Lie algebras can be decomposed. 
Intuitively, one would want to call a Lie algebra “simple” if it has no proper subal¬ 
gebras. However, in terms of decomposition, such subalgebras are required to be 
ideals. So the natural definition of a simple Lie algebra would be the following: 


semisimple Lie 27.2.17. Definition. A Lie algebra that has no proper ideal is called a simple 
algebras Lie algebra. A Lie algebra is semisimple if it has no (nonzero) commutative ideal. 

For example, the pseudo-orthogonal algebra o(p f n — p) is semisimple，but 
the Poincare algebra p ( 尸 ， n — p) is not because the translation generators Py form 
a commutative ideal. 

A useful criterion for semisimplicity is given by the following theorem due to 
Cartan，which we state without proof (fora proof, see [Barn 86, pp. 15-16]): 


27.2.18. Theorem. (Cartan) A Lie algebra d is semisimple if and only ifdex (gf j) ^ 


27.3 REPRESENTATION OF COMPACT LIE GROUPS 845 


Cartan subalgebra 
and the rankof a Lie 
algebra 


Schur’s lemma 


The importance of semisimple Lie algebras is embodied [Baru 86 , pp. 19-20]. 

27.2.19. Theorem. (Cartan) A semisimple complex or real Lie algebra can be 
decomposed into a direct sum of pairwise orthogonal simple subalgebras. This 
decomposition is unique up to ordering. 

The orthogonality is with respect to the Killing form. Theorem 27.2.19 reduces 
the study of semisimple Lie algebras to that of simple Lie algebras. What about 
a general Lie algebra 0 ? If 0 is compact, then it turns out that it can be written 
as = 3 0 s where 3 is the center of t) and 5 is semisimple. If D is not compact, 
then the decomposition will not be in terms of a direct sum，but in terms of what is 
called a semidirect sum one of whose factors is semisimple. For details, the reader 
is referred to the fairly accessible treatment of Barut and Raczka, Chapter 1. From 
nowon we shall restrict our discussion to semisimple Lie algebras. These algebras 
are completely known, because simple algebras have been completely classified. 
We shall not pursue the classification of Lie algebras. However, we simply state a 
definition that is used in such a classification, because we shall have an occasion 
to use it in the representation theory of Lie algebras. 

27.2.20. Definition, Let 。 be a Lie algebra. A subalgebra l) ofx> is called a Cartan 
subalgebra is the largest commutative subalgebra of )3, and for all X 

adx leaves a subspace ofX> invariant, then it leaves the complement oft invariant 
as well. The dimension oft) is called the rank of'o. 


27.3 Representation of Compact Lie Groups 

Representation of general Lie groups is closely related to Tepresentation of their 
Lie algebras ，and we shall discuss them in the next two sections. In this section, 
however，we shall consider the representation of compact Lie groups ，because for 
such groups, many of the ideas developed for finite groups hold. Before discussing 
compact groups, let us state a definition and a proposition that hold for all Lie 
groups. 

27.3.1. Definition. A representation of a Lie group G on a Hilbert space % is 
a Lie group homomorphism T •• G GL{%). Similarly，a representation of the 
Lie algebra g is a Lie algebra homomorphism % : 

The proposition we have in mind is the important Schur’s lemma which we 
state without proof (for a proof see [Baru 86 , pp. 143-144]). 

27.3.2. Proposition. (Schur'slemma) A unitary representation T : G GL(K) 
of a Lie group G is irreducible if and only if the only operators commuting with 
all the Tg are scalar multiples of the unit operator. 




846 27. LIE GROUPS AMD LIE ALGEBRAS 


All representations of 
compact groups can 
be made unitary. 


27.3.3. Example* Compactness of £/(«), 0(n), SU(n) 9 and SO(n) 

Identify GL(n t €) with R 2w via components. The map 

/ : GL(n,C) GL(n,C) given by /(A) = AA t 

is continuous because it is simply the products of elements of matrices. It follows that . 
/ 一 1 (1) is closed, because the matrix 1 is a single point in ， which is therefore closed. 

厂 1 (1) is also bounded, because 

n n 

AA 个 = 1 今 > : a ij a ki = 占 /fc #|a"|2 = w. 
j=l * ‘ ， j=l 

Thus, / 一 i(1) is a (2w 2 — l)-dimensional sphere of radius ^/n in which is clearly 
bounded. The BWHB theorem (of Chapter 16) now implies that /— 1 (1) is compact. Now 
note that / -1 (1) consists of all matrices that have their hennitian adjoints for an inverse; 
but these are precisely the set U (n) of unitary matrices. 

Now consider the map det : U(n) €. This map is also continuous, implying that 
det -1 (1) is a closed subset of U («). The boundedness of U (n) implies that der 1 (1) is also 
bounded. Invoking the BWHB theorem again, we conclude that det 一 ^l) = SU(n), being 
closed and bounded, is compact. 

If instead of complex numbers, we restrict ourselves to the reals, 0(n) and 5(9 («) will 
replace U(n) and SU {n) t respectively. @ 


The result of the example above can be summarized: 


27.3.4. Box» The unitary U(n) y orthogonal 0(n) y special unitaiy SU(n), 
and special orthogonal SO(n) groups are all compact. 


We now start our study of the representations of compact Lie groups. We first 
show that we can always assume that the representation is unitary. 

27.3.5. Theorem. Let T : G ^ GL(IK) be any representation of the compact 
group G • There exists a new inner product in relative to which T is unitary. 

Proof. Let (|) be the initial inner product. Define a new inner product (|) by 
Mv) s 





27.3 REPRESENTATION OF COMPACT LIE GROUPS 847 


Weyl operator for a 
compact Lie group 


where dfig is the Haar measure，which is both left- and right-invariant. The reader 
may check that this is indeed an inner product. For every A 6 G, we have 


(TftwITftu) 


J G 


G 


{Tg} t u\Tgf l v)dfj, 


§ 


— I (T"^m|T^u )dfj,gf l 
J G 

= (咖） . 


(because T is a representation) 
(because \i g is right invariant) 


This shows that T/^ is unitary for a\\h e G. 


□ 


From now on, we shall restrict our discussion to unitary representations of compact 
groups. 

The study of representations of compact groups is facilitated by the following 
construction: 

27.3.6. Definition. Let T : G ^ GL^K) be a unitary representation of the 
compact group G and \u) € % a fixed vector. The Weyl operator K M associated 
with \u) is defined as 

= /* \^g u ) (T^mI d^ig. (27.49) 

J G 

The essential properties of the Weyl operator are summarized in the following: 

27.3.7. Proposition. Let T ' G — GL{K) be a unitary representation of the 
compact group G. Then the Weyl operator has the following properties 


1. K w is hermitian, 

2. K m T^ = T g K u forall g e G. Therefore, any eigenspace of K u is an imariant 
subspace of all Tg f s. 

3. K m is a Hilbert — Schmidt operator. 

Proof. Statement (1), in the form K„ |i;>* = (v| K„ |u;>, follows directly from 
the definition. 

(2) From T g f G |T^m) {T x u\ dfi x = J G \T g T x u) {T x u\ dfi x , the fact that T is a 
representation (therefore, T g T x = T^), and redefining the integration variable to 
y = gx, we get 


T g K u 




/ ]T^m) {Tg-iyU\ dfj>g—iy = / |T^m) iTy w[ d^iy, 

] G ' - v - ^ JG 





27. LIEGROUPS AND LIEALGEBRAS 


where we used the left invariance of fi and the fact that T is a representation. 
Unitarity of T now gives 

TgK w = / |T-yW) (T^Tjwj dfj,y = J |T^m) (T-yM| d^iy = K M T^. 

(3) Recall that an operator A e is Hilbert-Schmidt if ll A l^") II 2 ^ 
finite for any orthonormal basis {| 灼 ” of In the present case, we have 

\c{) = / ]Tj ； m) (T^wI Ci) dfj> x . 

JG 

Therefore, 

^]||K m \ei) || 2 = {ei\T y u) {T y u\ d Py) (J g \ t x u ) 


{ei\l y u) {T y u\T x u) (T x u\ei) dfi x 


If we switch the order of summation and integration and use 



〉 : (Txm| €i) {ei I TyU,) = (T^mI TyU ,}, 


we obtain 


£iik 山 ) 11 : 



G JG 


{Jyu\r x u)\ 2 d^ x dfx y , 


and using the Schwarz inequality in the integral yields 

y : l|K» I。) "2 < J J (T^m| T x u) (Tjm| Tyii) d^i x d^iy 

=J j {u\u} {u\ u) dfi x d^i y (because rep. is unitary) 
= M 4 J^d^i x J^dfi y = \\u\\ A Vl < oo, 
where Vq is the finite volume of G. I 


Hermann Klaus Hugo Weyl (1885 - 1955) attended the gymnasium at Altona and, on the 
recommendation of the headmaster of his gymnasium, who was a cousin of Hilbert, decided 
at the age of eighteen to enter the University of Gottingen. Except for one year at Munich he 





27.3 REPRESENTATION OF COMPACT LIE GROUPS 849 


remained at Gottingen, as a student and later as Privatdozent, until 1913, when he became 
professor at the University of Zurich. After Klein’s retirement in 1913, Weyl declined an offer 
to be his successor at Gottingen but accepted a second offer in 1930, after Hilbert had retired. 
In 1933 he decided he could no longer remain in Nazi Germany and accepted a position 
at the Institute for Advanced Study at Princeton, where he worked until his retirement in 
1951. In the last years of his life he divided his time between Zurich and Princeton. 

Weyl undoubtedly was the most gifted of Hilbert’s stu¬ 
dents. Hilbert’s thought dominated the first part of his math¬ 
ematical career; and although later he sharply diverged from 
his master, particularly on questions related to foundations of 
mathematics, Weyl always shared his convictions that the value 
of abstract theories lies in their success in solving classical 
problems and that the proper way to approach a question is 
through a deep analysis of the concepts it involves rather than 
by blind computations. 

Weyl arrived at Gottingen during the period when Hilbert 
was creating the spectral theory of self-adjoint operators, and 
spectral theory and harmonic analysis were central in WeyFs mathematical research thiough- 
out his life. Very soon, however，he considerably broadened the range of his interests, in¬ 
cluding areas of mathematics into which Hilbert had never penetrated, such as the theory of 
Lie groups and the analytic theory of numbers, thereby becoming one of the most universal 
mathematicians of his generation. He also had an important role in the development of 
mathematical physics, the field to which his most famous books, Raum, Zeit und Materie 
(1918), on the theory of relativity, and Gruppentheorie und Quantenmechanik (1928), are 
devoted. 

Weyl's versatility is illustrated in a particularly striking way by the fact that immediately 
after some original advances in number theory (which he obtained in 1914)，he spent more 
than ten years as a geometer — a geometer in the most modem sense of the word, uniting 
in his methods topology, algebra, analysis, and geometry in a display of dazzling virtuosity 
and uncommon depth reminiscent of Riemann. Drawn by war mobilization into the German 
army，Weyl did not resume his interrupted work when he was allowed to return to civilian life 
in 1916. At Zurich he had worked with Einstein for one year, and he became keenly interested 
in the general theory of relativity, which had just been published; with his characteristic 
enthusiasm he devoted most of the next five years to exploring the mathematical framework 
of the theory. In these investigations Weyl introduced the concept of what is now called a 
linear connection, linked not to the Lorentz group of orthogonal transformations, but to the 
enlarged group of conformal transformations; he even thought for a time that this would 
give him a unified theory of gravitation and electromagnetism, the forerunner of what is 
now called gauge theories. 

Weyl's use of tensor calculus in his work on relativity led him to reexamine the basic 
methods of that calculus and，more generally, of classical invariant theory that had been its 
forerunner but had fallen into near oblivion after Hilbert’s work of 1890. On the other hand, 
his semiphilosophical, semimathematical ideas on the general concept of “space” in connec¬ 
tion with Einstein’s theory had directed his investigations to generalizations of Helmholtz’s 
problem of characterizing Euclidean geometry by properties of “free mobility.” From these 
two directions Weyl was brought into contact with the theory of linear representations of 





850 27. LIE GROUPS AND LIE ALGEBRAS 


Lie groups; his papers on the subject (1925-1927) certainly represent his masterpiece and 
must be counted among the most influential works in twentieth-century mathematics. 

Based on the early 1900s works of Frobenius, I. Schur, and A. Young, Weyl inaugurated 
a new approach for the representation of continuous groups by focusing his attention on 
Lie groups, rather than Lie algebras. 

Very few of Weyl’s 150 published books and papers 一 even those chiefly of an expository 
character — lack an original idea or a fresh viewpoint. The influence of his works and of 
his teaching was considerable: He proved by his example that an “abstract” approach to 
mathematics is perfectly compatible with “hard” analysis and, in fact, can be one of the 
most powerful tools when properly applied. 

Weyl was one of that rare breed of modem mathematician whose contribution to physics 
was also substantial. In an interview with a reporter in 1929, Dirac is asked the following 
question: “ •.. I want to ask you something more: They tell me that you and Einstein are 
the only two real sure-enough high-brows and the only ones who can really understand each 
other. I won’t ask you if this is straight stuff, for I know you are too modest to admit it. But 
I want to know this 一 Do you ever ran across a fellow that even you can’t understand?” To 
this Dirac replies one word: “Weyl.” 

Weyl had a lifelong interest in philosophy and metaphysics, and his mathematical 
activity was seldom free from philosophical undertones or afterthoughts. At the height 
of the controversy over the foundations of mathematics, between the formalist school of 
Hilbert and the intuitionist school of Brouwer, he actively fought on Brouwer’s side. His 
own comment, stated somewhat jokingly, sums up his personality: “My work always tried 
to unite the truth with the beautiful, but when I had to choose one or the other, I usually 
chose the beautifiil.” 


We now come to the most fundamental theorem of representation theory of 
compact Lie groups. Before stating and proving this theorem, we need the follow¬ 
ing lemma: 

27.3.8. Lemma. Let T : G GL(K) be an irreducible unitary representation 
of a compact Lie group G. For any nonzero \u ), \v) € Oi, we have 

\\uf\\vf £ j 洲 糾 2 如 = c ， （ 27 . 50 ) 

where c > 0 is a constant independent of\u) and |u>. 

Proof. By Schur’s lemma and (2) of Proposition 27.3.7, K M = A(m)1 . Therefore, 
on the one hand, 

(v\K u \v)=k(u)\\v\\ 2 . 

On the other hand, 

(v\ K„ \v) = f (v\T x u) {T x u\v) dfi x = f \{v\T x \u}\ 2 dfi x . 

Jg Jg 


(27.51) 


(27.52) 





27.3 REPRESENTATION OF COMPACT LIE GROUPS 851 


Moreover, if we use dfi g = dfi g -\ (see Problem 27.11)，then 


\v) = / {v\T x \u) {u\j\ |u) dfi x = / {v\T x \u) (wjTJ 1 \v) dfi x 
Jg Jg 

= J (w|T^-i \v) (i;|T x \u) df! x = j WlTy |v 》 <v|T 厂 1 |m> d\i y - 
r 

(M|Ty |u> {v\ T 1 * \u) = {u \ K v |w). 

=<T 冲） 


G 


This equality plus Equation (27.51) gives 


入⑻ M 2 = a(iOIWI 2 # 


入⑻ 入⑷ 

M 1 " 1 nF 


Since \u) and |i;) are arbitrary, we conclude that k{u) — c||w|| 2 for all \u) € IK, 
where c is a constant. Equations (27.51) and (27.52) now yield Equation (27.50). 
If we let Im) = \v) in Equation (27.52) and use (27.51)，we obtain 



\{u\T x M \ 2 d[i x = A.(m)||m [| 2 


= c||m|| 4 . 


That c > 0 follows from the fact that the LHS is a nonnegative continuous function 
that has at least one strictly positive value in its integration range, namely oix = e, 
the identity. □ 

27.3.9. Theorem. Every irreducible unitary representation of a compact Lie 
group is finite-dimensional• 


Proof. Let {| 灼 > be any set of orthononnal vectors in Oi. Then, unitarity of 

implies that {T g Wi}}^ is also an orthonormal set. Applying Lemma 27.3.8 to 
\ej) and \e\) y we obtain 


G 


\{e\\T x \ej)\ 2 dix x =c. 


Now sum over j to get 

\M^x \ej )\ 2 dfi, 


nc 


tf G 

JG 


G 


^|( ei jT^}] 2 dfi x 


< / d^i x = Vg, 

Jg 


where we used the Parseval inequality [Equation (5.3)] as applied to the vector 
|^i) and the orthononnal set {T g |^)}f =1 . Since both Vq and c are finite, n must 
be finite as well. Thus, % cannot have an infinite set of orthononnal vectors. □ 



852 27. LIE GROUPS AND LIE ALGEBRAS 


So far, we have discussed irreducible representations. What can we say about 
arbitrary representations? We recall that in the case of finite groups, every repre¬ 
sentation can be written as a direct sum of irreducible representations. Is this also 
true for compact Lie groups? 

Firstly, we note that the Weyl operator, being Hilbert—Schmidt, is necessarily 
compact. It is also hermitian. Therefore, by the spectral theorem, its eigenspaces 
span the carrier space !K. Specifically, we can write % = Mo ㊉ J2j=i ㊉ ⑽ j ， 
where !Mo is the eigenspace corresponding to the zero eigenvalue of K Ui and N 
could be infinity. 

Secondly, from the relation (v| K„ |i;) = c||w|| 2 ||u[| 2 and the fact that c ^ 0 
and I w 》 一 0, we conclude thatK„ cannot have any nonzero eigenvector for its zero 
eigenvalue. It follows that Mo contains only the zero vector. Therefore, if JC is 
infinite-dimensional, then N = oo. 

Thirdly, consider any representation T of G. Because K tt commutes with all 
T g ，each eigenspace of K« is an invariant subspace under 7\ If a subspace U is 
invariant under T, then UnH；,a subspace of Mj. ， is also invariant (reader, please 
verify!). Thus, all invariant subspaces of G are reducible to invariant subspaces 
of eigenspaces of K„. In particular, all irreducible invariant subspaces of T are 
subspaces of eigenspaces of K u . 

Lastly, since all are finite-dimensional, we can use the procedure used in 
the case of finite groups and decompose Mj into irreducible invariant subspaces 
of T. We have just shown the following result: 

27.3.10. Theorem. Every unitary representation T of a compact Lie group G is 
a direct sum of irreducible finite-dimensional unitary representations. 

By choosing a basis for the finite-dimensional invariant subspaces of T, we 
can represent each by a matrix. Therefore, 



As in the case of finite groups, one can work with matrix elements and charac¬ 
ters of representations. The only difference is that summations are replaced with 
integration and order of the group |G| is replaced with Vq, which we take to be 
unity. 9 For example, Equation (24.6) becomes 


'G 


T^(g)XT^(g- l )dfi g = X x 8 a ^, 


(27.53) 


and the analogue of Equation (24.8) is 

JG n a 


(27.54) 


9 This can always be done by rescaling the volume element. 






27.3 REPRESENTATION OF COMPACT LIE GROUPS 


Characters satisfy similar relations: Equation (24.11) becomes 
f X (o0 (g)X’te) 叫=_， （ 27.55) 

J G 

and the useful Equation (24.16) turns into 

f = (27.56) 

This formula can be used to test for irreducibility of a representation: If the integral 
is unity, the representation is irreducible; otherwise, it is reducible. 

Finally, we state the celebrated Peter-Weyl theorem (for a proof, see [Baru 86, 
pp. 172-173]) 

Peter-Weyl theorem 27.3.12. Theorem. (Peter-Weyl theorem) The functions 

V a and l <ij <n a , 

forma complete set of functions in L 2 {G) y the Hilbert space of square-integrable 
functions on G. 

If m € £ 2 (G)，we can write 

= H b ?j T lf ⑻ where bfj = n a 
ij 

27*3.13. Example. Equation (27.57) is the generalization of the Fourier series expansion 

of functions. The connection with Fourier series becomes more transparent if we consider 

a particular compact , group. The unit circle 5 1 is a one-dimensional abelian compact 1- 

parameter Lie group. In fact, fixing an “origin” on the circle, any other point can be described 

by the parameter 沒 ， the angular distance from the point to the origin. is obviously abelian; 

it is also compact, because it is a bounded closed region of M 2 (BWHB theorem). By 

The Peter-Weyl Theorem 24.2.3, which holds for all Lie groups, ail irreducible representations of 5 1 are 

theorem is the i-dimensional. So T^\g) T^(0). Furthermore, r ⑻ (0)r ⑻ (00 = r ⑻ (6> + V). 

generalization of the 3 

Fourier series Differentiating both sides with respect to 9 at 没 = 0 yields the differential equation 

dT ⑻ 
dy 

=a 

The solution to this DE is Ae ad . Since r ⑻ are unitary, and since a 1-dimensional unitary 
matrix must look like e l(p , we must have A = l. Furthermore, 6 and 9-\-27t are identified on 
the unit circle; therefore, we must conclude that a is i times an integer n, which determines 
the irreducible representation. We label the irreducible representation by n and write 

T ⑻ m=e in0 ， « = 0±1±2 •… 

The Peter-Weyl theorem now becomes the rule of Fourier series expansion of periodic func¬ 
tions. This last property follows from the fact that any function u : > M is necessarily 

periodic. 圔 



expansion of periodic 
functions. 


T (a) (0) 


dT ⑻ 
dO f 0f = Q 


/，啡 r 神 & . 57) 




854 27. LIE GROUPS AND LIE ALGEBRAS 


There are many occasions in physics where the state functions describing 
physical quantities transform irreducibly under the action of a Lie group (which 
we assume to be compact). Often this Lie group also acts on the underlying space- 
time manifold. So we have a situation in which a Lie group G acts on a Euclidean 
space R n as well as on the space of (square-integrable) functions ( (W 1 ). Therefore, 
the functions {^^(x)}, belonging to the Qfth irreducible representation transform 
among themselves not only because of the index but also because of the argument 
x e R n . 

To see the connection between physics and representation theory, consider 
the transformation of the simplest case, a scalar function. As a concrete example, 
choose temperature. Ib observer O at the comer of a room 8 meters long, 6 meters 
wide, and 3 meters high, the temperature of the center of the room is given by 
0(4,3, 1.5) where 0(x, y, z) is a function that gives O the temperature of various 
points of the room. Observer O r is sitting in the middle of the floor, so that the 
center of the room has coordinates (0,0,1.5). O l also has a function that gives 
her the the temperature at various points. But this function must necessarily be 
different from 9 because of the different coordinates the same points have for 
O and O r . Calling this function we have 0\0, 0, 1.5)= 沒 (4,3,1.5), and in 
general, 

z / ) = 0(x,y 9 z), 

where {x f , y\ z f ) describes the same point for O’ that (x, y, z) describes for O. 

In the context of representation theory, we can think of (x\ y\ z f ) as the trans¬ 
formed coordinates obtained as a result of the action of some group: (x\ y\ z f ) — 
g • (x, y, z), or x f = g - x. So, the equation above can be written as 

0 f (x f ) = 0(x) = 0(g— 1 • x') or 6 f (x) = 6{g~ l -x). 

It is natural to call V the transform of 9 under the action of g and write 6 r — l g 0. 
This is one way of constructing a representation [see the comments after Equation 
(24.1)]. Instead of g~ l on the left, one could act with g on the right. 

When the physical quantity is not a scalar, it is natural to group together the 
smallest set of functions that transform into one another. This leads to the set of 
functions that transform according to a row of an irreducible representation of the 
group. In some sense，this situation is a combination of (24.1) and (24.37). The 
reader may verify that 

n ct 

T^ a) (x) = 2 (27.58) 

J=l 

defines a representation of G. 

We now use Box 27.1.26 to construct an irreducible representation of the Lie 
algebra of G from Equation (27.58). By the definition of the infinitesimal action, 


27.3 REPRESENTATION OF COMPACT LIE GROUPS 855 


we let g = exp ( 以 ） and differentiate both sides with respect to r at / = 0. This 
yields 

f exp ⑻ (^ a) (x) _ = (exp(^)) 0 f } (x - exp(-^r))} 

S - - - ， J-l 

•( 时 )00 

n a j 丨 

I r /r)( ex P ⑽ I O exp(-^O)) 

户 1 ^~^ 

+ Y] Tjf (exp ⑻ ) ） — 0 j a) (x • exp (-^)) ， 

1 l ... ■> dt f=o 

7=1 v 



where we have defined the matrices S ji ( 爸 ） for the LHS. The derivative in the first 
sum is simply 2^)(0 the representation of the generator 爸 of the 1 -parameter 

group of transformations in the space of functions {0 产 ) }• The derivative in the 
second sum can be found by writing x’(/) = x . exp(—and differentiating as 
follows: 

去 ⑴) L 三去 …，” [ =o = h4>f 、 ^(^(0)[ =o 

= dk^x^x ； o ^ a v ^ a) x v (x ； o, 

where we used Equation (27.18) and defined X k (x; 0 by the last equality. We also 
changed the coordinate index to Greek to avoid confusing it with the index of the 
functions. Collecting everything together, we obtain 

(⑽ f )( x ) = (辦 ⑻ + ( x; 

y=i 7=1 j=i ox 

or, since (d/d#) 伞， 

^ij(0 = ^\04 a) (^Ao) + S # V ( X ; 幻 /v ， (27.59) 

3^ ax 

where XX) is the vth component of the infinitesimal generator of the action 
induced by 芒 e 0 . We shall put Equation (27.59) to good use when we discuss 
symmetries and conservation laws in Chapter 30. The derivative with respect to the 
functions, although meaningless at this point, will be necessary when we discuss 
conservation laws. 



856 27. LIE GROUPS AND LIE ALGEBRAS 


27.4 Representation of the General Linear Group 

GL(V) is not a compact group, but we can use the experience we gained in the 
analysis of the symmetric group to find the irreducible representations of GL(V), 
The key is to construct tensor product spaces of V — which, as the reader may 
verify, is a carrier space of GL(V) 一 and look for its irreducible subspaces. In fact, 
if r is an arbitrary positive integer, T : G ^ GL(V) is a representation, and 

r times 

Then T^ r : G ^ GL(V® r ), given by 

[T^ r (g)](yu Tf (vi, Tgin) %•••% T^(v,), 

is also a representation. In particular, considering V as the (natural) carrier space 
for GL(V), we conclude that r 0r : GL(V) -> is a representation. 

This tensor product representation is reducible, because as is evident from 
its definition, Tf r preserves any symmetry of the tensor it acts on. For example, 
the subspace of the full n r -dimensional tensor product space consisting of the 
completely symmetric tensors of the type 

= ^ yjr(i) \7t(2) 0 … 

neS r 

is invariant. Similarly, the subspace consisting of the completely antisymmetric 
tensor products ― the r-fold wedge products — is invariant. 

To reduce V 饮 ， we choose a basis {e^}? i for V. Then the collection of n r 
tensor products {e^ 0* • where each kj runs from 1 to n, is a basis for V® r • 

An invariant subspace of V® r is a span of linear combinations of certain of these 
basis vectors. Since the only thing that distinguishes among {e*! <S> • •. e* r } is a 
permutation of the r labels, we start to see the connection between the reduction 
of V 0r and S r . This connection becomes more evident if we recall that the left 
multiplication of the group algebra of S r by its elements provides the regular 
representation, which is reducible. The irreducible representations are the minimal 
ideals of the algebra generated by the Young operators. 

The same idea works here as well: Certain linear combination of the basis 
vectors of V® 1 ' obtained by permutations can serve as the basis vectors for irre¬ 
ducible representations of GL(V). Let us elaborate on this. Recall that a Young 
operator of S r is written in the form Y = QP where Q and P are linear combina¬ 
tions of permutations in S r . Y has the property that if one operates on it (via left 
multiplication) with all permutations of S r , one generates a minimal ideal, i.e., an 
irreducible representation of S r , Now let Y be a Young operator that acts on the 
indices (k\, giving linear combinations of the basis vectors of V^ r . From 

the minimality of the ideal generated by Y and the fact that operators in GL(V) 


27.4 REPRESENTATION OF THE GENERAL LINEAR GROUP 857 


connection between 
the Young tableaux 
and irreducible 
representations of 
GL{V) 


permute the factors in 0- - in all possible ways, it should now be clear that 

if we choose any single basis vector e^ r , then Y(ei； 1 ® * - *0ejt r ) generates 

an irreducible representation of GL(V). We therefore have the following: 

27.4.1. Theorem. Let be any basis for V, Let Y = QP the Young 

operator of S r that permutes (and takes linear combinations of) the basis vectors 
{e*! • ⑭ e*,.}. Then for any given such basis vector, the vectors 

{Tf r y(e kl ^--.^e kr )\geGL(V)} 

span an irreducible subspace ofV^ r . 


A basis of such an irreducible representation can be obtained by taking into 
account all the Young tableaux associated with the irreducible representation. But 
which of the symmetry types will be realized for given values of n and r ? Clearly, 
the Young tableau should not contain more than n rows, because then one of the 
symbols will be repeated in a column, and the Young operator will vanish due to 
the antisymmetry in its column indices. We can therefore restrict the partition (k) 
to 


( 入 ） =( 入 1 ，久 2 , • •. ，入 „)， 入 1 +…+ 入 „ = r ， 入 1 仝入 2 匕 ♦. • 之入 /I ? 0- 

Let us consider an example for clarification. 

27«4.2, Example. First, let n = r = 2. The tensor product space has 2 2 = 4 dimensions. 
To reduce it, we consider the Young operators, which correspond to e + (A：! ， and e — 
{k\, ^2)- Let us denote these operators by Yi and Y 2 , respectively. By applying each one to 
a generic basis vector ® %， we can generate all the irreducible representations. The 
first operator gives 


Y 1 (e*! ejt 2 ) = ejt! 0 ek 2 + e^ 2 (8) , 

where 仑 1 and can be 1 or 2. For k\ = k 2 = 1, we get 2e\ ® ep For A：i = 1 ，灸 2 = 2, 
or = 2, 灸 2 = 1， we get ej <S) e2 + ^ ^ e l ■ Finally, for k\ = k 2 = 2, we get 2e2 ® e 2 . 
Altogether, we obtain 3 linearly independent vectors that are completely symmetric. 

When the second operator acts on a generic basis vector, it gives 

丫 2(% ⑭ e^ 2 ) = e kl 0 e fc2 - e* 2 . 


The only time that this is not zero is when k\ and are different. In either case，we get 
士 (e! <g) e 2 — e 2 ® ei). This subspace is therefore one-dimensional. 

The reduction of the tensor product space can therefore be written as 

T ® 2 = Spanfei ⑭ ei ， ei 0 e 2 + 巧⑧ e! ， e] ® e〗} ㊉ Span{ei ® e 2 — e 2 (S) e^}. 

v . 1 v-- 、 • “ v - J 

3D symmetric subspace ID antisymmetric subspace 

Next, let us consider the case of n = 2, r = 3. The tensor product space has 2 3 = 8 
dimensions. To reduce it, we need to consider all Young operators of S 3 . There are four of 
these, corresponding to the following tableaux: 


858 27. LIE GROUPS AND LIE ALGEBRAS 


k l 1 k 2 k 3| 

K 

k 2 

0 

k 3 



k 3 


^2 


^2 


Let us denote these operators by Yi ， Y2, Y 3 , and Y4, respectively. By applying each one to 
a generic basis vector e^j 0 ® e^ 3 , we can generate all the irreducible representations. 

The first operator gives 

Y l ( 邙】⑧叫 2 ③ 你 3 ) = 你 2 ® 办 3 + ③ e 々 3 ejt 2 + 你 2 ® 0 办 3 

+ % ® e *3 ^ + efc 3 0 <2i e k2 + ® <g> e^j, 

where 灸 1 ，灸 2, and 知 can be 1 or Z For ^1 = *2 = ^3 = ^ we S et 6e l <S) ei 0 ei. For the 
case where two of the kf ’s are 1 and the third is 2, we get 

2(ej ® ei 0 e 2 4- ei 0 e 2 ⑧ e! + e〗® ei ® ej). 

For the case where two of the 灸 / ’s are 2 and the third is 1 , we get 

2 (ei ⑧ e〗® 0 + e 2 ⑭ ei 0 62 + e ；2 ® % ⑭ ei). 

Finally, for k\ = k2 = = 2 , we get 6e2 0 e2 0 ^2 - Altogether, we obtain 4 linearly 

independent vectors that are completely symmetric. 

When the second operator acts on a generic basis vector, it gives 10 

¥ 2 (% 0 % ㈣ 3 ) = [e- ⑻， * 3 )][e + (ki，k 2 )](e kl <S) 0 e^) 

= ^ ^ efe 2 0 e^ 3 + ejc 2 0 0 

- e/t 3 <S) e^ 2 (g) e^ 1 - e* 2 0 ® . 

If all three indices are the same, we get zero. Suppose k\ = 1 . Then & can be 1 or 2 . For 
k.2 = 1 , we must set ^3 = 2 to get e2 0 ei (gi ei — ei 0 62 <S) ej. For = 2 , we must 
set A：3 = 1 to obtain ei (g) e2 ® 62 — e〗ei ③ e〗.If we start with = 2, we will not 
produce any new vectors, as the reader is urged to verify. Therefore, the dimension of the 
irreducible subspace spanned by the second Young tableau is 2 . 

The action of the third operator on a generic basis vector yields 

Y 3 (®iti ® ek 2 (E> e/j 3 ) = [e- ( 幻 ， &)]!> + ( 灸 i ， 灸 3)](% ® 枝 2 <g) 印 3 ) 

=e J t l e k2 <gi+e^ 2 ®e fcl <8>e fe3 
- % 0 % ® % - e 女 3 ③ e 幻⑭极 2 . 

The reader may check that we obtain a two-dimensional irreducible representation spanned 
by e\ (2> e\ ® e2 — 62 ® ei <8) ei and ei ® e2 0 % — e2 <S> ©2 ③ ei. 

The fourth Young operator gives zero because it is completely antisymmetric in three 
slots and we have only two indices. The reduction of the tensor product space can therefore 


10 When a symmetric group is considered as an abstract group — as opposed to a group of transformations—we may multiply 
permutations (keep track of how each number is repeatedly transformed) from left to right. However, since the permutations here 
acton vectors on their right, it is more natural to calculate their products from right to left. 


27.5 REPRESENTATION OF LIE ALGEBRAS 859 


be written as 

V 03 = Span{Yi( ejtl %)} ©Spai^% 0 私 2 ⑭ e^)} 

V - v - • V - v - ^ 

dim=4 dim=2 

㊉ SpanfYsCe^ (8) e k2 0 e^)}. 

s - V - ' 

dim =2 

We note that the total dimensions on both sides match. _ 


There is a remarkable formula that gives the dimension of all irreducible rep¬ 
resentations of GL(V) (see [Hame 89, pp. 384-387] for a derivation): 

27.4.3. Theorem. Let V be an n-dimensional vector space，and the irre¬ 
ducible subspace of tensors with symmetry associated with the partition (A.)= 
( 入 1 ， … ， X n ). Then 


dimV (X) 


£)(/l,..., In) 


D(n — 1, n — , 0) 

where lj ^ X.j n — j and D(x \,..., x n ) is as given in Equation (24.56). 


27.5 Representation of Lie Algebras 

The diffeomorphism established by the exponential map (Theorem 27.1.18) re¬ 
duces the local study of a Lie group to that of its Lie algebra. 11 In this book, 
we are exclusively interested in the local properties of Lie groups, and we shall 
therefore confine ourselves to Lie algebras to study the structure of Lie groups. 
Recall that any Lie group homomorphism leads to a corresponding Lie algebra 
homomorphism [Equation (27.7)]. Conversely, a homomorphism of Lie algebras 
can, through the identification of the neighborhoods of their identities with their 
Lie algebras, be “exponentiated” to a (local) homomorphism of their Lie groups. 
This leads to the following theorem (see [Fult 91 ， pp. 108 and 119] for a proof). 

27.5.1. Theorem* Let G be a Lie group with algebra g. A representation T :G ^ 
GH^K) determines a Lie algebra representation r* : g — Conversely, a 

Lie algebra representation X : 0 0 【 (rK) determines a Lie group representation. 

It follows from this theorem that all (local) Lie group representations result 
from corresponding Lie algebra representations. Therefore, we shall restrict our¬ 
selves to the representations of Lie algebras. 


11 We use the word “local” to mean the collection of all points that can be connected to the identity by a curve in the Lie group 
G. If this collection exhausts G, then we say that G is connected. If» furthermore, all closed curves (loops) in G can be shrunk 
to a point, we say that G is simply connected. The word “local” can be replaced by “simply connected” in what follows. 




860 27. LIE GROUPS AND LIE ALGEBRAS 


27.5.1 Representation of Subgroups of GL(V) 

Let g be any Lie algebra with basis vectors {X，}. Let a representation T map 
these vectors to {T^} e for some carrier space Ji. Then, a general element 
X = Y^i aiXf of g will be mapped to T = J2i ⑷丁卜 Now suppose that f) is a 
subalgebra of q. Then the restriction of T to f) provides a representation of f). This 
restriction maybe reducible. If it is, then there is an invariant subspace of %. 
It follows that 

{b\Tx\a) = = 0 V X e 行 whenever \a) e 0i\ and \b) g CK 十， 

where Tx = ^(X). If we write Tx = a- X) Tj, then in terms of T^, the equation 
above can be written as 

dimp dims 

J2 a i X) W r i\ a )^ al X) 4 ba) = 0 VX€l], (27.60) 

«=1 i=i 

where = {b\ T ； \a) are complex numbers. Equation (27.60) states that 


27,5*2. Box. If T, as a representation of > (a Lie subalgebra of is re- 
ducible，then there exist a number of equations that oi] must satisfy when¬ 
ever X. e If T, as a representation of g, is irreducible, then no relation 
such as given in (27.60) will exist whenX runs over all of 込 . 


This last statement will be used to analyze certain subgroups of GL(V). 

Let us first identify GL(V) with GL(n 9 C). Next, consider GL{n, M), which 
is a subgroup of GL(n, C), and transfer the discussion to their respective algebras. 
If {X/} is a basis of g[(n, C), then an arbitrary element can be written as ^ 

The difference between $[(«, C) and Ql(n, M) is that the at 9 s are real in the latter 
case; i.e., for all real values of {«*}, the sum belongs to M). Now suppose 
that r is an irreducible representation of gl(n, C) that is reducible when restricted 
to gl(«, 3R). Equation (27.60) states that the function 

n 2 

f(z \,..., z„ 2 ) = ^Zirl ba) 

i=l 

vanishes for all real values of the zi ? s. Since this function is obviously entire, it 
must vanish for all complex values of 2 /’s by analytic continuation (see Theorem 
11.3.1). But this is impossible because r is irreducible for g((n, C). We have to 
conclude that T is irreducible as a representation of Q[(n, E). 

The next subalgebra of Ql(n, C) we consider is the Lie algebra 5t(«, C) of the 
special linear group. The only restriction on the elements of C) is for them 





27.5 REPRESENTATION OF LIE ALGEBRAS 861 


to have a vanishing trace. Denoting trXi by ti, we conclude that X = a, Xf 

belongs to C) ifand only if J]* = 0. Let (^,?* 2 ) = \t) € C" 2 . Then 

2 

sl(n, C) can be characterized as the subspace consisting of vectors \a) e C n 
such that {a\t) =0. Such a subspace has n 2 — 1 dimensions. If any irreducible 
representation of 0 [(«, C) is reducible for sl(n, C), then the set of complex numbers 
{ofi} must, in addition, satisfy Equation (27.60). This amounts to the condition that 
\a) be orthogonal to |r (〜 ）〉as well. But this is impossible, because then the set 
{|a), |0 , ( 如 )〉} would constitute a subspace of C 矿 whose dimension is at least 
n 2 + 1: There are w 2 — 1 of |a 〉 ’s, one |【>，and at least one |r 伽 ）}. Therefore, 
all irreducible representations of gl(n f C) are also irreducible representations of 
Bi(n, C). The last subalgebra of C) we consider is the Lie algebra u(n) of 
the unitary group. To study this algebra, we start with the Weyl basis of Equation 
(27.27) for g[(n, C), and construct a new hermitian basis {X^ } defined as 

Xjj = ejj for all j = 1 ， 2 ,… ， n ， 

X kj =i(ekj -e^) 

A typical element of C) is of the form ^2 k j otkjXjcj, where are complex 
numbers. If we restrict ourselves to real values of then we obtain the subal¬ 
gebra of hermitian matrices whose Lie group is the unitary group U ⑻. The fact 
that the irreducible representations of ^{{n, C) will not reduce under u(n) follows 
immediately from our discussion concerning Ql(n, M). We summarize our findings 
in the following: 

27.5.3. Theorem. The irreducible representations of GL(n, C) are also irre¬ 
ducible representations ofGL{n, M), SL(n, C), U(n), and SU(n). 

The case of SU (n) follows from the same argument given earlier that connected 
GL(n, C) to SL(n, C). 

27.5.2 Casimir Operators 

In the general representation theory of Lie algebras, it is desirable to label each 
irreducible representation with a quantity made out of the basis vectors of the Lie 
algebra. An example is the labeling of the energy states of a quantum mechani¬ 
cal system with angular momentum. Each value of the total angular momentum 
labels an irreducible subspace whose vectors are further labeled by the third com¬ 
ponent of angular momentum (see Chapter 12). This subsection is devoted to the 
generalization of this concept to an arbitrary Lie algebra. 

27.5.4. Definition. Let % q QiOi) be a representation of the Lie algebra g. 
Casimir operator A Casimir operator Cfor this representation is an operator that commutes with 

defined all lx of the representation. 

If the representation is irreducible, then by Schur’s lemma, C is a multiple 
of the unit operator. Therefore, all vectors of an irreducible invariant subspace of 




862 27. LIE GROUPS AND LIE ALGEBRAS 


the carrier space % are eigenvectors of C corresponding to the same eigenvalue. 
That Casimir operators actually determine the irreducible representations of a 
semisimple Lie algebra is the content of the following theorem (for a proof, see 
Chevalley’s theorem [Vara 84, pp. 333-337]). 

27.5.5. Theorem. (Chevalley) For every semisimple Lie algebra q of rank 12 r with 
a basis {X*}, there exists a set ofr Casimir operators in the form of polynomials 
in Tx,- whose eigenvalues characterize the irreducible representations of 

From now on, we shall use the notation X* for Tx i . It follows from Theorem 
27.5.5 that all irreducible invariant vector subspaces of the carrier space can be 
labeled by the eigenvalues of the r Casimir operators. This means that each in¬ 
variant irreducible subspace has a basis all of whose vectors carry a set of r labels 
corresponding to the eigenvalues of the r Casimir operators. 

One Casimir operator — in the form of a polynomial of degree two~~which 
works only for semisimple Lie algebras, is obtained easily: 

C = (27.61) 

Uj 

where g ij is the inverse of the Cartan metric tensor. In fact, with the summation 
convention in place, we have 

[C, X k ] = g"[XA ， X*] = g" {XJX;，XJ + PQ ， X k ]Xj} 

=g" {c^XiXr -f- C^X r Xj} 

a * ■ * 

= g lJ c r ik (XjX r -h X, Xj) (because g lJ is symmetric) 

= 8 ij 8 Sr C iks (XjXr-hXrXj) 

= 0. (because g sr Cik s is antisymmetric in j, r) 

The last equality follows from the fact that g” and g sr are symmetric, ciks is 
completely antisymmetric [see the discussion following Equation (27.45)]，and 
there is a sum over the dummy index s, 

27.5.6. Example. The rotation group 5C>(3) in R 3 is a compact 3-parameter Lie group. 

The infinitesimal generators are the three components of the angular momentum operator 
(see Example 27.1.29). From the commutation relations of the angular momentum operators 
developed in Chapter 12, we conclude that It follows that the Cartan metric 

tensor is 

Sij = = (^isr)i^jrs) ~ ~^~^isr € jsr = 加 ij • 

Ignoring the factor of 2 and denoting the angular momentum operators by Lf , we conclude 
that 

L 2 = Lf + + L\ 


12 Recall that the rank of fl is the dimension of the Cartan subalgebra of 


27.5 REPRESENTATION OF LIE ALGEBRAS 863 


irreducible 
representations of 
the rotation group 
and spherical 
harmonics 


is a Casimir operator. But this is precisely the operator discussed in detail in Chapter 12. 
We found there that the eigenvalues of L 2 were labeled by j, where j was either an integer 
or a half odd integer. In the context of our present discussion, we note that the Lie algebra 
50(3) has rank one, because there is no higher dimensional subalgebra of 50(3) all of whose 
vectors commute with one another. It follows from Theorem 27.5.5 that L 2 is the only 
Casimir operator, and that all ineducible representations of 50(3) are distinguished 
by their label j. Furthermore, the construction of Chapter 12 showed explicitly that the 
dimension of T ( 力 is 2) + 1 . 

The connection between the representation of Lie algebras and Lie groups permits us to 
conclude that the irreducible representations of the rotation group are labeled by the (half) 
integers j, and the y'th irreducible representation has dimension 2j-\-l. When j is an integer 
l and the carrier space is L 2 (S 2 ) t the square-integrable functions on the unit sphere, then L 2 
becomes a differential operator, and the spherical harmonics Yi m (6, (p), with a fixed value 
of/, provide a basis for the Ith irreducible invariant subspace. 团 


Connection between 
Casimir operators 
and the PDEs of 
mathematical 
physics 


The last sentence of Example 27.5.6 is at the heart of the connection between 
symmetry, Lie groups, and the equations of mathematical physics, A symmetry 
operation of mathematical physics is expressed in terms of the action of a Lie 
group on an underlying manifold M ， i.e., as a group of transformations of M. 
The Lie algebra of such a Lie group consists of the infinitesimal generators of the 
corresponding transformation. These generators can be expressed as first-order 
differential operators as in Equation (27.20). It is therefore natural to choose as the 
carrier space of a representation the Hilbert space C 2 (M) of the square-integrable 
functions on M ， which, through the local identification of M with R m (m = 
dim M) 9 can be identified with functions on R m . Then the infinitesimal generators 
act directly on the functions of L 2 (M) as first-order differential operators. 

The Casimir operators {C a }^ =1> where r is the rank of the Lie algebra, are 
polynomials in the infinitesimal generators, i.e” differential operators of higher 
order. On the irreducible invariant subspaces of each C a acts as a multiple 

of the identity, so if /(r) belongs to such an invariant subspace, we have 


C a f(r) = X(a)f(r), a = 1 ， 2,… ， r. (27.62) 

This is a set of differential equations that are invariant under the symmetry of the 
physical system, i.e.，its solutions transform among themselves under the action 
of the group of symmetries. 

It is a stunning reality and a fact of profound significance that many of the 
differential equations of mathematical physics are, as in Equation (27.62), expres¬ 
sions of the invariance of the Casimir operators of some Lie algebra in a particular 
representation. Moreover, all the standard functions of mathematical physics, such 
as Bessel, hypergeometric, and confluent hypergeometric functions, are related to 
matrix elements in the representations of a few of the simplest Lie groups (see 
[Mill 68] for a thorough discussion of this topic). 


Claude Chevalley (1909-1984) was the only son of Abel and Marguerite Chevalley who 
were the authors of the Oxford Concise French Dictionary. He studied under Emile Picard 



864 27. LIE GROUPS AND LIE ALGEBRAS 


at the Ecole Nomiale Superieur in Paris, graduating in 1929 and becoming the youngest of 
the mathematicians of the Bourbaki school. 

After graduation, Chevalley went to Germany to continue 
his studies under Artin at Hamburg during the session 1931-32. 

He then went to the University of Marburg to work with Hasse ， 
who had been appointed to fill Hensel’s chair there in 1930. 

He was awarded his doctorate in 1937, A year later Chevalley 
went to the Institute for Advanced Study at Princeton, where he 
also served on the faculty of Princeton University. From July 
1949 until June 1957 he served as professor of mathematics at 
Columbia University, afterwards returning to the University of 
Paris. 

Chevalley had a major influence on the development of several areas of mathematics. 
His papers of 1936 and 1941 led to major advances in class field theory and also in algebraic 
geometry. He did pioneering work in the theory of local rings in 1943, developing the ideas 
of Krull into a theorem bearing his name. Chevalley’s theorem was important in applications 
made in 1954 to quasi-algebraically closed fields and the following year to algebraic groups. 
Chevalley groups play a central role in the classification of finite simple groups. His name is 
also attached to Chevalley decompositions and to a Chevalley type of semisimple algebraic 
group. 

Many of his texts have become classics. He wrote Theory of Lie Groups m three volumes 
which appeared in 1946,1951, and 1955. He also published Theory of Distributions (1951), 
Introduction to the Theory of Algebraic Functions of one Variable (1951), The Algebraic 
Theory of Spinors (1954), Class Field Theory (1954), Fundamental Concepts of Algebra 
(1956), and Foundations of Algebraic Geometry (1958). 

Chevalley was awarded many honors for his work. Among these was the Cole Prize of 
the American Mathematical Society. He was elected a member of the London Mathematical 
Society in 1967. 



27.5.3 Representation of so ⑶ and 5 o(3,1) 

Because of their importance in physical applications, we study the representations 
of so(3), the rotation, and 50(3,1), the Lorentz, algebras. For rotations, we define 
Ji = — /M23, J2 = /M13, and J3 = —/Mu 13 and note that the J* J s satisfy exactly 
the same commutation relations as the angular momentum operators of Chapter 12. 
Therefore, the irreducible representations of so (3) are labeled by j, which can be 
an integer or a half-odd integer (see also Example 27.5.6). These representations 
are finite-dimensional because 50(3) is a compact group (Example 27.3.3 and 
Theorem 27.3.9). The dimension of the irreducible representation of 00 (3) labeled 
by7is2y + l. 

Because of local isomorphism of Lie groups and their Lie algebras, the same 
irreducible spaces found for Lie algebras can be used to represent the Lie groups. 


13 Sometimes we use and J z instead of Jj, J 2 , and J 3 . 






27.5 REPRESENTATION OF LIE ALGEBRAS 865 


rotation matrix 


Wigner formula for 
rotation matrices 


In particular, the states {\jm)}j n= _j, where m is the eigenvalue of J z , can also be 
used as a basis of the j-th irreducible representation. 

The flow of each infinitesimal generator of 50(3) is a one-parameter subgroup 
of 50(3). For example, exp(Mi 2 ^?) is a rotation of angle 炉 about the z-axis. Using 
Euler angles, we can write a general rotation as 

R(^r, 9,(p) = exp(Mi2iA) exp(M3i0) exp(Mi2 炉 ). 

The corresponding rotation operator acting on the vectors of the carrier space is 
R(i/r, 9, (p) — exp(Mi 2 ^) exp(M 3 i^) exp(Mi 2 ^) = e l h G 6 1 ^ % 


The rotation matrix corresponding to the above operator is obtained by sand¬ 
wiching R(^r, 9, (p) between basis vectors of a given irreducible representation: 

的三 咐， 0 ， <P)\j_ = (Pn’\ e Uz 伞 e u > ， e e Uz<p \jm) 

= e im >e_ (Jm'l e iJ y 0 \jm) = 

1 (27.63) 


Thus, the calculation of rotation matrices is reduced to finding d^ m (0). These are 
given by the Wigner formula (see [Hame 89, pp. 348—357]): 




fi) (cos-J (sin-j 




(27.64) 


where 


fx) = (-1)^ 


[Q + m ) l(j - m)\(j + - mQ!] 1 / 2 

(j + m — fi )\ fi\(j — m ’ 一 — m + fi )\ 


and the summation extends over all integral values of /i for which the factorials 
have a meaning. The number of terms in the summation is equal to 1 + r, where 
r is the smallest of the four integers j ± m, j ± m!. 

From the rotation matrices, we can obtain the characters of the rotation group. 
However, an easier way is to use Euler’s theorem (Theorem 4.7.7), Example 
23.2.19, and Box 24.2.7 to conclude that the character of a rotation depends only 
on the angle of rotation, and noton the direction of the rotation axis. Choosing the 
z-axis as our only axis of rotation, we obtain 


0*^1 e tJz<p \jm )= 亡 e im(p = e^ lj(p ^ e l 




m=— 


m: 


2j 


^<P 


e 


卿+1)炉 _ 


e，Y e' iw 

i—j e i<p 

k=0 

e i(j+i)<p — e -ij<p sin(j -f \) 


e l< P - 1 


sin ( 炉 /2) 


m= 


(27.65) 



27, LIE GROUPS AND LIE ALGEBRAS 


Equation (27.65) can be used to obtain the celebrated addition theorem for 
angular momenta. Suppose that initially we have two physical systems correspond¬ 
ing to angular momenta 71 and When these systems are made to interact with 
one another, the total system will be described by the tensor product states. These 
states are vectors in the tensor product of the irreducible representations r。 1 ) and 
尸 C/2) 0 f the rotation group, as discussed in Section 24.7. This product is reducible. 
To find the factors into which it reduces, we consider its character corresponding 
to angle (p. Using Equation (24.44), we have 


x UiXJ2) (.<p) = X Ul H<P) - X U2 \(p) = 严 1 w 〆 


im 2 <p 


m=-Ji 




X J A* 

E E 

m—-j\ ^n=-h 




h-^h J h+h 

=E E E 沪⑽， 

■^=IA-J2l m=-j J=\ji-J2\ 

where the double sum on the third line is an equivalent way of writing the double 
summation of the second line, as the reader may verify. From this equation we 

addition theorem for read off the Clebsch-Gordan decomposition of the tensor product: 
angular momenta 

jl+j2 

T (Ji) 0 T Ui) ^ r ⑺， (27.66) 

J=Ui-Jz\ 

which is also known as the addition theorem for angular momenta. Equation (27.66) 
shows that (see page 702) 


27.5.7, Box. The rotation group is simply reducible. 


The RHS of Equation (27.66) tells us which irreducible representations result 
from multiplying T^ 1 ) and T^ J2 \ In particular, if 71 = h = h the RHS includes 
the J = 0 representation, i.e., a scalar. In terms of the states, this says that we can 
combine two states with angular momentum l to obtain a scalar state. Let us find 
this combination. We use Equation (24.48) in the form 

^ M) \j\, m\\ j 2 , m 2 ) , m 1 -hm 2 = M. 

»u ， m 2 (27.67) 

In the case under investigation, y = 0 = M, so (27.67) becomes 


|00》=^ C(//; 0|m, —m; 0) \lm\ /, —m). 

m=—l 



27.5 REPRESENTATION OF LIE ALGEBRAS 867 


Problem 27.33 shows that C(ll; 0|m, —w; 0) = -h 1, so that 

ioo= 


Take the “inner product” of this with {6, <p\ 0\ (p f \ to obtain 


Hr 、 ^,^| 00 > = 


£ 


V2TFT 


(-iy 


•m 


^2TTl 


(0, <p; 0 f , (p f \lm\ l, —m) 


{0 9 <p\lm) (9\ (p f \U -rn>, 

V 丫 八 y j 

Yi m (e t <p) 


(27.68) 


where we have used {0, <p\9 r , <p ! \ = {0, (p\ (0 f , <p f \ and contracted each bra with 
a ket. We can evaluate the LHS of (27.68) by noting that since it is a scalar, the 
choice of orientation of coordinates is immaterial. So, let 0 = 0 to get 9 r = y, the 
angle between the two directions. Then using the facts 

121 + 1 121 + 1 

YlmiO, (p) = S m oyj — and Yio(0, (p) = ^ ^ P/(cos 沒 ) 

on the RHS of (27.68), we obtain 

(—I ) 1 _ _ 

{6,(p; G\ (p f \ 00} = - ~-A^/TTPKcosy). 

4?r 

Substituting this in the LHS of Equation (27.68), we get 
Pl(cosy) = —— (^) m yim(0 9 (p)Yi,- m (0 f 9 <p f ), 

A 十 1 m=-l 


which is the addition theorem for spherical harmonics discussed in Chapter 12. 
Let us now turn to 50(3,1). We collect the generators in two categories 

M = M 3 ) = (M 23 , M 31 , M 12 ), 

^^(NuN 2 ,N 3 )^ (M 01 , M 02 , Mob), 

and verify that 

[Mi, Mj] = -€ijkMk, [Ni, Nj] = [Mi, Nj] = 一 eijid 

and that there are two Casimir operators: M 2 — N 2 and M _ N. It follows that the 
irreducible representations of 50(3,1) are labeled by two numbers. To find these 
numbers，define the generators 


J= 豆 (M + ， N )， K= -(M-/N), 




868 27. LIE GROUPS AND UE ALGEBRAS 


and show that 

[//, 7 m ] = (imkJfc 、 ^/] = (ijmKm ， Kj] — 0 . 

It follows that the /’s and the K's generate two completely independent Lie al¬ 
gebras isomorphic to the angular momentum algebras and that 50(3,1) is a direct 
sum of these algebras. Since each one requires a (half-odd) integer to designate its 
irreducible representations, we can choose these two numbers as the eigenvalues 
of the Casimir operators needed to label the irreducible representations ofso(3,1). 
Thus, the irreducible representations of so(3,1) are of the form r ( 力 ' where j 
and / can each be an integer or a half-odd integer. 

27.5.4 Representation of the Poincare Algebra 

The Poincare algebra n — p), introduced in Section 27.2.1, is the generaliza¬ 
tion of the Lie algebra of the invariance group of the special theory of relativity. 
It contains the Lorentz, the rotation, and the translation groups as its proper sub¬ 
groups. Its irreducible representations are of direct physical significance, and we 
shall study them here. 

As the first step in the construction of representations of {)(/7 ， n— 卩 )， we shall try 
to find its Casimir operators. Equation (27.61) suggests one, but it works only for 
semisimple Lie algebras, and the Poincare algebra is not semisimple. Nevertheless, 
let us try to find an operator based on that construction. From the commutation 
relations for p(p, n -尸)， as given in Equation (27.40), and the double-indexed 
structure constants defined by, 14 

[M/j, M^/] = [Mij , Pk] = c Tj,k^ 

we obtain 

cf}% = - + - sps^ Jkl 

cf jtk = ST”ik - (27.69) 

From these structure constants, we can construct a double indexed “metric” 

which the reader may verify to be equal to 
Sij y ki — 2(n — l)(j]jkm - 


14 Please make sure to differentiate between the pair (M/j, P^) (which acts on p) and the pair (M^,P *)， which acts on the state 
vectors in the Hilbert space of representation. 


27.5 REPRESENTATION OF LIE ALGEBRAS 869 


There is no natural way of constructing a single-indexed metric. Therefore, we can 
only contract the M’s. In doing so, it is understood that the indices are raised and 
lowered by rjij. So, the first candidate fora Casimir operator is 

M 2 = =2(n - - mknji)^ lj ^ kl = —4(n - l)M tj fAij 

The reader may verify that M 2 commutes with all the M" ’s but not with the P”s. 
This is to be expected because M 2 , the total “angular momentum” operator 15 is 
a scalar and should commute with all its components. But commutation with the 
P l ’s is not guaranteed. 

The construction above, although a failure, gives us a clue for a successful 
construction. We can make another scalar out of the P’s_ The reader may check 
that P 2 = rf^PiPj indeed commutes with all elements of the Poincare algebra. 
We have thus found one Casimir operator. Can we find more? We have exhausted 
the polynomials of degree two. The only third-degree polynomials that we can 
construct are M^PiPj and The first one is identically zero (why ?)， 

and the second one will not commute with the P’s. 

To find higher-order polynomials in the infinitesimal generators, we build new 
tensors out of them and contract these tensors with one another. For example, 
consider the vector 

Ci = \AijP j = r] kj MijP k . (27.70) 

Then^C/, a fourth-degree polynomial in the generators, is a scalar, and therefore, 
it commutes with the My’s，but unfortunately, not with P/ } s. 

Another common way to construct tensors is to contract various numbers of 
the generators with the Levi-Civita tensor. For example, 

w n...i„- 3 = (27.71) 

is a contravariant tensor of rank n — 3. Let us contract W with itself to find a scalar 
(which we expect to commute with all the ’s): 

W 2 = 

= 扣」 ，卜 3 州 M 作 Paw, 卜 3 mM r5 p f 

=(-i，- E … fe_ 3 A 為 Z ⑺ 

7t 

= (-卵 - 3)! E6^4)4 ⑻ 4(,)M 并 P^V, 

71 

where we used Equation (25.21). The sum above can be carried out, with the final 
result 

W 2 = 2( - - 3)! (MijUi ij P 2 - 2Ci&) 

= 2(-l)^(n-3)! (M 2 P 2 - 2C 2 ), (27.72) 

15 This “angular moraentum” includes ordinary rotations as well as the Lorentz boosts. 



870 27. LIEGROUPS AND LIE ALGEBRAS 


where C; was defined in Equation (27.70). We have already seen that M 2 , P 2 , and 
C 2 all commute with the M^’s. The reader may check that W 2 commutes with the 
Pj’sas well. In fact, - 3 itself commutes with all the P/s. Other tensors and 

Casimir operators can be constructed in a similar fashion. 

We now want to construct the irreducible vector spaces that are labeled by 
the eigenvalues of the Casimir operators. We take advantage of the fact that the 
Poincare algebra has a commutative subalgebra, the translation generators. Since 
the P ； t’s commute among themselves and with P 2 and W 2 , we can choose simulta¬ 
neous eigenvectors of {P*}^ =1 ,P 2 , and W 2 . In particular, we can label the vectors of 
an irreducible invariant subspace by the eigenvalues of these operators. The P 2 and 
W 2 labels will be the same for all vectors in each irreducible invariant subspace, 
while the P^’s will label different vectors of the same invariant subspace. 

Let us concentrate on the momentum labels and let be a vector in an 
irreducible representation of p(p, n — p), where p labels momenta and \i distin¬ 
guishes among all different vectors that have the same momentum label. We thus 
have 

Pk l<> = Pk l<) for /: = 1, 2, ■•• ， n ， (27.73) 

where pk is the eigenvalue of P*. We also need to know how the “rotation” op¬ 
erators act on |t^p ). Instead of the full operator e Mi 卢 1 J ， we apply its small - angle 
approximation 1 + MijG^. Since all states are labeled by momentum, we expect 
the rotated state to have a new momentum label, i.e., to be an eigenstate of Pk. We 
want to show that (1 + MijO 1 ^) \iffp) is an eignevector of P 灸 . Let the eigenvalue 
be p' which should be slightly different from p. Then, the problem reduces to 
determining 5p’ = p ; — p. Ignoring the index for a moment, we have 

p k \fp r ) = Pk 1^) = (Pk + Spk) (1 + MijO lj ) |i/r p ). 

Using the commutation relations between and we can write the LHS as 

LHS = P/t (1 + \f ? ) = [pk + 0 lj (Mijpk + rfjkPi - mPj)) l^p> - 

The RHS, to first order in infinitesimal quantities, can be expressed as 

RHS = (pk + Spic + pkO lj fAij) |^p). 

Comparison of the last two equations shows that 

^Pk = 0 lj (TJjkPi — VikPj) = 0 lj (rjjkm - r]ik^jl)P l = 0 lj (Mij) u p l , 
where we used Equation (27.37). It follows that 

p’ = p + 5p = (i+ e ij uiij)p, 

stating that the rotation operator of the carrier Hilbert space rotates the momentum 
label of the state. Note that since “rotations” do not change the length (induced by 
rf), and p have the same length. 




27.5 REPRESENTAT10W OF LIE ALGEBRAS 871 


construction and To obtain all the vectors of an irreducible representation of p(p,n — p), we 
properties of the little must apply the rotation operators to vectors such as |^p >. But not all rotations 
group will change the label p; for example, in three dimensions, the vector p will not be 
affected 16 by a rotation about p. This motivates the following definition. 

27.5.8, Definition. Let po be any given eigenvalue of the translation generators. 
The set JR Po of all rotations A po that do not change po, is a subgroup of the rotation 
little group and little group 0(p，n — p、，called the tittle group corresponding to po. The little algebra 
algebra consists of the generators M?? satisfying 

M? ； p 0 = 0. 

The significance of the little group resides in the fact that a representation of 
況 Po induces a representation of the whole Poincare group. We shall only sketch 
the proof in the following and refer the reader to Mackey [Mack 68] fora full and 
induced rigorous discussion of induced representations, 
representations Suppose we have found an irreducible representation of 3lp 0 with operators 
A Po . Let A PPo be the rotation that carries 17 po to p, i.e., p = A PPo po. Consider any 
rotation A and let p’ be the momentum obtained when A acts on p, i.e., Ap = p 7 . 
Then 

A A PPo po = A PPo po (A PPo ) 1 AA PPo po = po- 

=p =p 7 

This shows that (A P Po ) AA pPo belongs to the little group. So, 

(A 々。 ) 一 1 AA_=AP 0 

for some APo e S po . Thus, A = AP'PoAPo (A_)—\ and 

r(A) I 碎 》 =A |<> = a p ' p °ap° (A_) -1 |<> = a^p°ap° iO 

= A P'P0 E r v/i (AP°) |^ 0 > = J2 T v ^)A^ \ir； 0 ) 

V V 

=J2 \r p ^) = E T ^( AP0 ) i^a p ) - 

V V 

Note how the matrix elements of the representation of the little group alone have 
entered in the last line. We therefore consider 

r( A )|<) ， E ^(AP 0 )|^ V> where f (，- 1 . 

v lp =Ap . (27.74) 

16 The reader should be warned that although such a rotation does not change p, the rotation operator may change the state 

l^p}. However, the resulting state will be an eigenstate of the P& ’s with eigenvalue p. 

17 We are using the fact that 0{p,n — p) is transitive (see Problem 27.39). 


872 27. LIE GROUPS AND LIE ALGEBRAS 


To avoid confusion, we have used R for the representation of the little group. We 
claim that Equation (27.74) defines a (matrix) representation of the whole group. 
In fact, 

7 ， (A 1 )r(A 2 ) |<) = r(Ai) J2 及吵 (Ap) \^ 2V ) 

V 

= ^2 Rvfi(^°) E ^pv( A i°) IV^A2P 》 

V P 

=E R pv (Aj 0 ) R Vfl (A 2 0 IV^ iA2P > . 

9 '- -- , - / 

=/?p 从 （ aJ^A^ 0 ) since R is a rep. 

The reader may check that A^A^ 0 = (Ai A 2 ) Po . Therefore, 

T(Ai)T(A 2 ) l<) : ((AiA 2 )P«) I 喊 A2P > 三 r(AxA 2 ) I 紱)， 

P 

and T is indeed a representation. It turns out that if R is irreducible, then so is T. 
The discussion above shows that the irreducible representations of the Poincare 
group are entirely determined by those of the little group and Equation (27.73). 
The recipe for the construction of the irreducible representations of p(/?, n — p) is 
now clear: 

27.5.9. Theorem. Choose any simultaneous eigenvector po of the PkS. Find the 
little algebra JR Po at po by finding all Uifs satisfying M^po = 0. Find all irre¬ 
ducible representations offRp 0 . The same eigenvalues that label the irreducible 
representations of0{ po can be used, in addition to those ofP 2 and W 2 , to label the 
irreducible representations ofp(p, n — p). 

We are particularly interested in p(3,1)，the symmetry group of the special 
theory of relativity. In applying the formalism developed above, we need to make 
contact with the physical world. This always involves interpretations. Borrow¬ 
ing from the angular momentum theory, in which a physical system was given the 
attribute of angular momentum, the label of the irreducible representation of the ro¬ 
tation group, we attribute the labels of an irreducible representation of the Poincare 
group, i.e” the eigenvalues of the four translation generators and the two Casimir 
operators，to a physical system. Since the four translation generators are identified 
as the three components of momentum and energy, and their specification implies 
their constancy over time, we have to come to the conclusion that 


27.5.10. Box. An irreducible representation of the Poincare group specifies 
a free relativistic particle. 





27.5 REPRESENTATION OF LIE ALGEBRAS 873 


There maybe some internal interactions between constituents of a (composite) 
particle ， e.g. between quarks inside a proton, but as a whole, the composite will be 
interpreted as a single particle. To construct the little group, we have to specify a 4- 
momentum po. We shall consider two cases: In the first case, po • Po # 0, whereby 
the particle is deduced to be massive and we can choose 18 po = (0,0,0, m). In 
the second case, po ' Po = 0, in which case the particle is massless, and we can 
choose po — (/?, 0,0, p). We consider these two cases separately. 

The little group (really, the little Lie algebra) for po = (0,0,0, m) is obtained 
by searching for those rotations that leave po fixed. This is equivalent to searching 
for My’s that annihilate (0,0,0, m), namely, the solutions to 

(Mfjpo)/ = (Mij)i r (po) r = (Mi ； )/om = 0 ^ (M^)；o = 0. 

Since (M^)/o = rjiorjji — rjjorjn, we conclude that (M iy )；o = 0 if and only if f ^ 0 
and j ^ 0. Thus the little group is generated by (M 23 , M 31 , M 12 ) which are the 
components of angular momentum. The reader may also verify directly that when 
the 4-momentum has only a time component, the Casimir operator W 2 reduces 
essentially to the total angular momentum operator. Since we are dealing with a 
single particle, the total angular momentum can only be spin. Therefore, we have 
the following theorem. 

27.5.H. Theorem* In the absence of any interactions，a massive relativistic par¬ 
ticle is specified by its massm and its spin s, the former being any positive number ， 
the latter taking on integer or half-odd-integer values. 

The case of the massless particle can be handled in the same way. We seek 
those Mij’s that annihilate (p, 0,0, p), namely, the solutions to 

(MrjPo)* = (^ij)kr(poY = Q^ij)k0P + (^ij)klP = 0. 

The reader may check that 

(MoiPo)fe = mkP- rjQkP, (M 02 P 0 )* = rj 2 kP ， (Mo3po)fc = mkP, 

(M23P0U = 0 ， (Mnpo)* = mkP， (Mopo)^ = mkP- 

Clearly, M 23 is one of the generators of the little group. Subtracting the middle 
terms and the last terms of each line，we see that M 02 — M 12 and M 03 — M 13 are the 
other two generators. These happen to be the components of W. In fact, it is easily 
verified that 

= W 1 — M 23 P ， W 2 = 2/?(Mi3 - Mo3) ， W -3 = 2/?(Mo2 — Mi2). 

(27.75) 

Therefore, the little group is generated by all the components of W. Furthermore, 
W 2 has zero eigenvalue for |^r po ) when po = (p, 0,0, p). Since both Casimir 


18 We use units in which c = 1. 


874 27. LIEGROUPS AND LIE ALGEBRAS 


operators annihilate the state |^p 0 >, we need to come up with another way of 
labeling the states. 


Eugene Paul Wigner (1902—1995) was the second of three 
children bom to Hungarian Jewish parents in Budapest. His 
father operated a large leather tannery and hoped that his son 
would follow him in that vocation, but the younger Wigner 
soon discovered both a taste and an aptitude for mathematics 
and physics. Although Wigner tried hard to accommodate his 
father’s wishes, he clearly heard his calling, and the world of 
physics is fortunate that he did. 

Wigner began his education in what he said “may have 
been the finest high school in the world.” He later studied 
chemical engineering and returned to Budapest to apply that 
training in his father’s tannery. He kept track of the seminal papers during the early years of 
quantum theory and, when the lure of physics became too strong, returned to Berlin to work 
in a crystallography lab. He lectured briefly at the University of Gottingen before moving 
to America to escape the Nazis. 

Wigner accepted a visiting professorship to Princeton in 1930. When the appointment 
was not made permanent, the disappointed young professor moved to the University of 
Wisconsin, where he served happily until his new wife died suddenly of cancer only a few 
months after their marriage. As Wigner prepared, quite understandably, to leave Wisconsin, 
Princeton corrected its earlier mistake and offered him a permanent position. Except for 
occasional visiting appointments in America and abroad, he remained at Princeton until his 
death. 

Wigner's contributions to mathematical physics began during his studies in Berlin, 
where his supervisor suggested a probiem dealing with the symmetry of atoms in a crystal. 
John von Neumann, a fellow Hungarian physicist, pointed out the relevance of papers by 
Frobenius and Schur on representation theory. Wigner soon became enamored with the group 
theory inherent in the problem and began to apply that approach to quantum mechanical 
problems. Largely at the urging of Leo Szilard (another Hungarian physicist and Wigner’s 
best friend), Wigner collected many of his results into the classic textbook Group Theory 
and Its Application to the Quantum Mechanics of Atomic Spectra. 

The decades that followed were filled with important contributions to mathematical 
physics, with applications of group theory comprising a large share: angular momentum; 
nuclear physics and SU(A) or “supermultiplet” theory; parity; and studies of the Lorentz 
group and Wigner's classic definition of an elementary particle. Other work included early 
efforts in many-body theory and a paper on level spacings derived from the properties of 
Hennitian matrices that later proved useful to workers in quantum chaos. 

As with most famous figures, TOgner’s personality became as well known as his profes¬ 
sional accomplishments. His insistence on “reasonable” behavior, for instance, made him 
refuse to pay a relative’s hospital bill until after the patient was released~it was obviously 
unreasonable to hold a sick person hostage. His gentleness is exemplified in an anecdote 
in which on getting into an argument about a tip with a New York City cab driver, Wigner 
loses his patience, stamps his foot, and says, “Oh，go to hell,. - . please!” 







27.5 REPRESENTATION OF LIE ALGEBRAS 875 


He held others’ feelings in such high regard that it was said to be impossible to follow 
Wigner through a door. He was light-hearted and fun-loving, but also devoted to his family 
and concerned about the future of the planet. This combination of exceptional skill and 
laudable humanity ensures Wigner，s place among the most highly regarded of his field. 
(Taken from E. Vogt, Phys. Today 48 (12) (1995) 40-44.) 


Define the new quantities 

H ± = ^(Wi± iW 2 ), Ho = ▲ W 0 

and the corresponding operators acting on the carrier space. From Equation (27.75), 
it follows that [Wi, W 2 ] = 0, W 2 = — 4H+H 一， and that 

[H + , H 0 ] = —H +， [H+,H 0 ] = H_, [H+, H-] = 0. 

Denote the eigenstates of W 2 and Ho by \a, p): 

w 2 |«, p) = a \a,P) , Ho \<x 9 p) = P 

Then the reader may check that H 士 |a , 妁 has eigenvalues or and # 土 1. By applying 
H 土 repeatedly, we can generate all eigenvalues of Ho and note that they are of the 
form 

^0 = r + where n = 0 , 士 1 ，士 2 , ■. _ and 1 > r > 0 . 

Since Ho = WI 23 , Hq is recognized as an angular momentum operator whose eigen¬ 
values are integer (for bosons) and half-odd integer (for fermions). Therefore, 
r = 0 for bosons and r = 5 for fermions. 

Now, within an irreducible representation, only those \a, fi)’s can occur that 
have the same a. Therefore, if we relabel the ^ values by integers, then 

(o?,n|Ho \a 9 m) = (r+ n)8 nm . 

Similarly, 

(a,n| H + \a,m) = a n 8 njtn ^i, 

(of, n|H_ \a,m) = b n 8 n ， m -i, 

where a n andfe rt are some constants. It follows that 

a =■ (a, n\ W 2 \a, n) = (o?, n\ H+H- \a, n) 

= {a, n\ H + |a, n - l) {a 7 n — 1\H- |a, n). 

If we assume that the representation is unitary, then all W/s will be hermitian, 
(H+p = H_，so % = % and or = \a n \ 2 > 0. 




876 27. LIE GROUPS AND LIE ALGEBRAS 


If a = 0, then a n = 0 and b n = 0 for all n. Consequently, H + = 0 = H_, 
i.e., there are no raising or lowering operators. It follows that there are only two 
spin states, corresponding to the maximum and the minimum eigenvalues of H。. 
A natural axis for the projection of spin is the direction of motion of the particle, 
helicity of massless Then the projection of spin is called helicity. We summarize our discussion in the 
particles following theorem. 

27.5.12* Theorem. In the absence of any interactions, a massless relativistic par¬ 
ticle is specified by its spin and its helicity. The former taking on integer or half- 
odd-integer values s, the latter having values -hs and —s. 

Theorems 27.5.11 and 27.5.12 are beautiful examples of the fruitfulness of the 
interplay between mathematics and physics. Physics has provided mathematics 
with a group, the Poincare group, and mathematics, through its theory of group 
representation, has provided physics with the deep result that all particles must 
have a spin that takes on a specific value, and none other; that massive particles 
are allowed to have 2^ + 1 different values for the projection of their spin; and 
that massless particles are allowed to have only two values for their spin projec¬ 
tion. Such far-reaching results that are both universal and specific makes physics 
unique among all other sciences. It also provides impetus for the development of 
mathematics as the only dialect through which nature seems to communicate to us 
her deepest secrets. 

If a > 0, then the resulting representations will have continuous spin variables. 
Such representations do not correspond to particles found in nature; therefore, we 
shall not pursue them any further. 

27.6 Problems 

. > 

27.1. Show that the set G = GL(n, E) x R n equipped with the “product” 

(A, u)(B, v) = (AB, Av + u) 

affine group forms a group. This is called the affine group. 

27.2. Show that m : U x U M. defined in Example 27.1.5 is a local Lie group. 

27.3. Find the multiplication law for the groups in (b) and (c) of Example 27.1.8. 

27.4. Show that the one-dimensional projective group of Example 27.1.8 satisfies 
all the group properties. In particular, find the identity and the inverse of an element 
in the group. 

27.5. Let G be a Lie group. Let Shea, subgroup of G that is also a submanifold 
of G. Show that 5 is a Lie group. 

27.6. Show that the differential map of : GL(V) M, defined by ^ (A) = AA 卞， 
where H is the set of hennitian operators on V, is surjective. Derive Equation 
(27.11). 



27.6 PROBLEMS 877 


27.7. Verify that I g = R~ x o L g is an isomorphism. 

27.8. Prove Proposition 27.1.21. 

27.9. Start with Equation (27.19) and use the fact that second derivative is inde¬ 
pendent of the order of differentiation to obtain 


^iK 


_ 如入 \ 


n -l ^UiK n _i dUi K 


0. 


Now use the chain rule dui K /dax = i^Ui K /dxj){dXj/dax) and Equation (27.19) 
to get 


一入 1 = 0 , 



「化 1 

^ 1 ] 



u iK 

_ 如 入 


+ 

sxj . Ujk dxj J 


or 


M jcr 


duix 


M j ： 


Bxj 




(27.76) 


where 

O) 


aa 入 


da 


弘 ■ 




Substituting Equation (27.76) in Equation (27.21) leads to (27.22). Now differen¬ 
tiate both sides of Equation (27.76) with respect to a p to get 


= 0. 

With the assumption that the are linearly independent, conclude that the struc¬ 
ture “constants” are indeed constants. 

27.10. Find the invariant Haar measure of the general linear group in two dimen¬ 
sions. 


da 


27.11. Show that the invariant Haar measure for a compact group satisfies dfi g = 
d\L g -\ . Hint: Define a measure u by dv z = dfi g -i and show that v is left-invariant. 
Now use the uniqueness of the left-invariant Haar measure for compact groups. 

27.12. Show that 0(/7, n - p) is a group. Use this and the fact that =r)to 
show that ArjA 1 = r\. 

27.13. Show that the orthogonal group 0{p,n- p) has dimension n(n- 1)/2. 
Hint: Look at its algebra o{p,n — p). 



878 27. LIE GROUPS AND LIE ALGEBRAS 


27.14. Let x = (xi,X 2 t x^ 9 jco) be a timelike or (null) 4-vector with 文 0 > 0. Let A 
be a proper orthochronous transformation. Show that x’ = Ax is also timelike (null). 
Hint: Consider the zeroth component of x 7 as an inner product of (x\,X 2 , x^,xq) 
and another vector and use Schwarz inequality. 

27*15. Starting with the definition of each matrix, derive Equation (27.40). 

27.16. LetDi and D 2 be derivations of a Lie algebra o. Show that Di D 2 = Dj o D 2 
is not a derivation, but [Di ， D 2 ] is. 

27.17. Let tJ be a Lie algebra. Verify that adx is a derivation of t) for any X e t>, 
and that ad[x ， Y] = [adx, ady] 

27.18. Show that 少： t? — aW given by 少 (X) = adx is ⑻ a homomorphism, (b) 
ker is the center of t), and (c) is an ideal of 2) 0 . 

27.19. Suppose that a Lie algebra t) can be decomposed into 过 direct sum of Lie 
subalgebras. Show that each subalgebra is necessarily an ideal of 0. 

27.20. Show that if ^ is an automorphism of t), then 
ad^(X) = ir o adx 0 诊 1 V X € t). 

Hint: Apply both sides to an arbitrary element of t). 

27.21. Show that for any Lie algebra, 

C ijk = cl js c il c kr + c L ， C /j C rfe 

is completely antisymmetric in all its indices. 

27.22. Show that the Killing form of t) is invariant under all automorphisms of t). 

27.23. Show that the translation generators Pj of the Poincare algebra p(p, n — p) 
forma commutative ideal. 

27.24. Find the Cartan metrics for 0(3,1) and p(3, 1), and show directly that the 
first is semisimple but the second is not. 

27.25. Show that the operation on a compact group defined by 

(wju) = / {T g u\T g v)d/ji g 

JG 

is an inner product. 

27.26. Show that the Weyl operator K„ is hermitian. 

27.27. Derive Equations (27.53) and (27.54). Hint: Follow the finite-group anal¬ 
ogy. 











27.6 PROBLEMS 879 


27.28. Suppose that a Lie group G acts on a Euclidean space R n as well as on the 

space of (square-integrable) functions <C(R n ). Let transform as thezth row of 

the ath irreducible representation. Verify that the relation 

T g ^ a) (x) = £ 

7=1 

defines a representation of G. 

27.29. Show that GL(V) is not a compact group. Hint: Find a continuous function 
GL(V) C whose image is not compact. 

27.30. Suppose that T \ G — GL(V) is a representation, and let 



r times 


be the r-fold tensor product of V. Show that T® r : G GL(V^ r ), given by 
Tf (vi,..., v r ) = : Vvi) (g) … (8)T^(v r ), 
is also a representation. 

27.31. Suppose that in Example 27.4.2, we set k\ =2 for our treatment of n = 2, 

r = 3. Show that Y 2 (e^ ® ® e^) does not produce any new vector beyond 

what we obtained for k\ = 1 . 

27.32. Show that g sr Cik s is antisymmetric in j and r. 

27.33. Operate L + on |00) = EL=-/C ⑻； 01 泔，一泔； 0) |/m; /，一 and use 
L+ _ = 0 to find a recursive relation among C(ll; 0|m, —m; 0). Use nor¬ 
malization and the convention that C(ll; 0|m, —m; 0) > 0 to show that 
C(ll; 0|m, —m; 0 ) = (-l) l ~ m /V2fTl (see Section 12.3). 

27.34. Show that the generators of so(3, 1), 

M = M 3 ) ^ (M23, M3X, M12), 

N = (NuN 2 , N3) = (M01, M02, M03 )， 

satisfy the commutation relations 

[Mi, Mj] = -€ijkMk, [Ni ， Nj] = €ijkMk, [M/, Nj]= — 邮外， 
and that M 2 — N 2 and M • N commute with all the M’s and the N、. 

27.35. Let the double-indexed “metric” of the Poincare algebra be defined as 

SijM = cY hmn c^ rs + c r ijm c^ r , 

where the structure constants are given in Equation (27.69). Show that 
gij，ki = 2(n — l)(rjjkr}il - nik^jl)' 



880 27. LIE GROUPS AND LIE ALGEBRAS 


27.36. Show that [M 2 , M"] = 0, and 
[M 2 , P k ]^4fJi kj pj +2(n-l)P k . 

27.37. Show that the vector operator 

C* = MijP j = r} kj fAijP k 
satisfies the following commutation relations: 

[C*, Pj] = rjijP 2 — PiPj, [C/, M 片 ] = Y]ikOj — rjijCic, [C/, Cj] = M"P 2 . 

Show also that [C 2 , M#] = 0, C ? Pi = 0, and 

p^ C| - = -(« - 1)P 2 ， [C 2 , P/] = {2C,- + (n - 1)P/}P 2 . 

27.38. Derive Equation (27.72) and show that W f l *commutes with all the 
P/s. 

27.39. Let e x = (x\, ..., x n ) be any unit vector in M w . 

(a) Show that a matrix is //-orthogonal, i.e” it satisfies Equation (27.33), if and 
only if its columns are 77 -orthogonal. 

(b) Show that there exists an A g 0(p,n — p) such that ^ = Aei where 61 = 
(1,0,..., 0). Hint: Find the first column of A and use (a). 

(c) Conclude that 0(p, n ~ p) is transitive in its action on the collection of all 
vectors of the same length. 

27^40. Verify directly that when the 4-momentum has only a time component, the 
Casimir operator W 2 = W * W reduces essentially to the total angular momentum 
operator. 

27.41* Verify that for the case of a massless particle, when po = (p, 0,0, p), 

Wo = W\ = M23, W2 — 2/?(Mi3 — Mo 3)，= 2/7 (M02 — Mi2 )， 
and that W 2 = W • W annihilates | ^ po ). 

Additional Reading 

1. Barut ， A. and Raczka，R Theory of Group Representations and Applications, 
World Scientific, 1986. A comprehensive introduction to Lie groups and Lie 
algebras using the modem language of manifolds. Intended for physicists 
and mathematicians alike, the first chapter is a long introduction to Lie 
algebras. 

2. Chevalley, C. Theory of Lie Groups, Princeton University Press, 1946. Still 
a relevant classic for Lie groups and manifold theory written by one of the 
major contributors to the subject. 






27.6 PROBLEMS 881 


3. Gilmore, R. Lie Groups，Lie Algebras, and Some ofTheir Applications, Wi¬ 
ley, 1974. Although the concept of manifold is introduced, no heavy use of 
it is made in the treatment of Lie groups and Lie algebras, and the “paramet- 
ric” method of Sophus Lie is employed throughout the book. Nevertheless, 
the book does a good job of classifying the Lie algebras for outsiders. 

4. Hamermesh, M. Group Theory and Its Application to Physical Problems, 
Dover, 1989. Does not use the modem language of manifolds, but does a 
good job of introducing Lie groups and Lie algebras via “parameters.” 



28 _ 

Differential Geometry 


The elegance of the geometrical expression of physical ideas has attracted much 
attention ever since Einstein proposed his geometrical theory of gravity in 1916. 
Such an interpretation was, however, confined to the general theory of relativity 
until the 1970s when the language of geometry was found to be most suitable, not 
only for gravity, but also for the other three fundamental forces of nature. Geometry, 
in the form of gauge field theories of electroweak and strong interactions, has been 
successful not only in creating a model — the so-called standard model — that 
explains all experimental results to remarkable accuracy, but also in providing 
a common language for describing all fundamental forces of nature, and with 
that a hope for unifying these forces into a single all-embracing force. This hope 
is encouraged by the successM unification of electromagnetism with the weak 
nuclear force through the medium of geometry and gauge field theory. 

The word “geometry” is normally used in the mathematics literature for a man¬ 
ifold on which a “machine” is defined with the property that it gives a number when 
two vectors are fed into it. Symplectic geometry’s machine was a nondegenerate 
2-form. Riemannian (or pseudo-Riemannian) geometry has a symmetric bilinear 
form (metric, inner product). Both of these geometries are important: Symplectic 
geometry is the natural setting for Hamiltonian dynamics, and Riemannian geom¬ 
etry is the basis of the general theory of relativity. In this chapter, we shall study 
the latter. 1 However, before introducing the metric, let us investigate some related 
structures that are independent of a metric. 


^his chapter really belongs to the previous part of the book; however, because of our use of certain Lie-group theoretic ideas 
in Sections 28.4 and 28.6, we have included it here. 



28.1 VECTOR FIELDS AND CURVATURE 883 


vector-valued 
1-forms 


28.1 Vector Fields and Curvature 

A manifold is, in general, not flat (flatness will be defined later). One way to “feel” 
the curvature of a space intrinsically is to translate a vector parallel to itself along 
different paths and compare the final vectors. In a flat space the two vectors at the 
end will be the same, but not in a general space. An illustration is provided by the 
surface of a sphere. Assume that we have a vector perpendicular to the equator. 
To exaggerate the effect of curvature, we move the vector parallel to itself on the 
equator a quarter of the way around the sphere, then all the way to the north pole. 
Alternatively, we start with the vector again perpendicular to the equator, but this 
time we move it parallel to itself directly to the north pole. Clearly, the two final 
vectors will not be the same; in fact, they will be perpendicular to one another. 

The above intuitive discussion should help to make it clear that to find the 
curvature of space, we look at how vectors change. In analogy with the exterior 
derivative and forms, we want to introduce a derivative that operates on vectors. 
In fact, since we already have the exterior derivative available, let us see if we can 
extend it to vectors. 

Consider an arbitrary 1-form u? and an arbitrary vector field v. The pairing 

v〉is a real-valued function / on which we know how the exterior derivative 
acts. A natural extension of d (which we denote by the same symbol) is given as 
follows: 

df = d (cj, v) = y) f + (u;, d\Y , (28.1) 

where we have used a prime to designate the new pairing. In general, this new 
pairing may be different from the ordinary pairing, because for the participants in 
the latter, no exterior differentiation is defined. As we shall see below, we indeed 
have to change the old pairing slightly for the new pairing to make sense. The LHS 
of Equation (28.1) is a 1-form. The first term on the RHS is a 2-form contracted 
with a vector, i.e.，a 1-form. For the second term to be a 1-form, d\ must be a tensor 
that contracts with a 1-form to give a 1-form. We say that d\ is a vector-valued 
1-form. 

Let us take a basis {e/} and its dual {e^} and express Equation (28.1) in terms of 
components in those bases. Then, on the one hand, with a;= 叫 e 1 and v = v-^ej, 
we have (w, y) = (OiV 1 and 

df = d {oj, v> = (do)i)v l 4* cotdv 1 . (28.2) 

On the other hand, 

df = {do)i Ae l + o>ide\ v^ej) f + {o)ie l ,dv^ej + v^dej) 

=v j {do)i A e^ej)' v j o) t (de 1 ’，e〆 + 叫 (e l t -h a)iv j (e l ， de〆 

=v J {dcoi a e l , ej) f + cotdv^ (e 1 , e；) +v J o)i {{de l ,ej) + (e l 9 dtj) }, 

(283) 


884 28. DIFFERENTIAL GEOMETRY 


where we have assumed that the primed pairing is equivalent to the old pairing 
when no derivatives are inside. If Equations (28.2) and (28.3) are to hold for 
arbitrary lj and v, we must have 2 

0={de\ej) f +{e\dej) f , 

(dcoi)v^ = (do)i A ej) . (28.4) 

The first relation is simply the fact that the exterior derivative of 8 l j = (e l ,ej) is 
zero. The second relation defines the order of contraction of vectors and higher- 
order forms in the new pairing. Since d(Oi is a 1-form, we can write it as dcoi = 
aae k , so that 

da)i A e l = otik€ k A e l , where aa — —ctki- 


Then, the second equation of (28.4) gives 

a ik a e\ej) f = dcoj = ajke k — \otik(&ije k — 

This equation demands that the new pairing of a vector with a wedge product of 
1-forms be defined as 

<€* 八 €、 e〆 e -i ej (e k A e l ) = - 知 €*)， 

where i tj is the interior product of Definition 26.5.8. We summarize the properties 
of the new pairing in the following equation: 

(de l ,tj) + {e\dej) =0， 

{e k A e^ej) = —i e .(c k A e £ ) = — — Sij€ k ). (28.5) 

In differential structures everything takes place locally, and translations and move¬ 
ments are all infinitesimal. Let us look at the exterior derivative from this point of 
view. For a rcai-yfunction on M, we have 




BxJ dx i 


9 2 / \ 

dx l dxJ ) 


dx l A dx l . 


Thus, d 2 f = 0 means that the mixed partial derivatives are independent of the 
order of differentiation — a familiar result. Geometrically, this means that for small 
displacements, dx l and dx^ , the value of a function is the same if one moves in two 
(perpendicular) directions, once in a given order and then in reverse order. This is 

2 These consistency relations apply to the first application of exterior derivative to the pairing. Higher-order applications, or, 
equivalently, application of d to a pairing involving wedge products of forms, may require new consistency relations. Fortunately, 
for our purposes, the first-order consistency relations will be sufficient. 



28.1 VECTOR FIELDS AND CURVATURE 885 


extracting the 
curvature 2-form 
from second exterior 
derivative of basis 
vectors 


true even if the space is curved. For flat spaces, we know that the same conclusion 
holds for displacement (parallel to themselves) of vectors. When we interpret d 2 \ 
as the change in a vector as it is displaced in two different directions, then the 
example of the sphere above suggests that d 2 \ must be related to curvature. Let 
us find this relation. 

Starting with a basis {e*} and an arbitrary vector v = i/e/，operate on it with d 

twice, keeping in mind that its action on functions and differential forms is exactly 

■ _ 

the same as the exterior derivative defined before: d\ = dv l ei + v l dti and 

d 2 \ = d 2 v l ei 4 - (—l) l dv l A det + dv l A dti +v l d 2 ti = v l d 2 %. (28.6) 

' - , - • 

=0 =0 

This equation has a remarkable property: It leaves the components of v undiffer¬ 
entiated! In other words, regardless of how any given two vectors v and w vary 
away from the point P of the manifold, d 2 y ― d 2 w as long as the two vectors are 
equal at P. More importantly, if we take the linearity of d (and therefore, d 2 ) into 
account, we obtain 


d 2 (f\^gu) = fd\^ gd 2 u. (28.7) 


It appears that d 2 \ depends noton external objects (vectors), but on the intrinsic 
property of the manifold, i.e” how it “curves” away from P. To find this curvature, 
expand the vector-valued 1-form dti as 

dt t = e ； 0 where = r 、 k e k , (28.8) 

As we shall see shortly, one has to be cautious to know which index is raised or 
lowered. In the formulas above，this caution has been observed by leaving blank the 
original position of the raised (or lowered) index. Differentiating Equation (28.8) 
once more，we obtain the vector-valued 2-form 

d 2 ei — dej A + ej 0 d(jj J ] = (tk (^ k j) A a/; + ~ 

= 奴 0 (u) k j A + ey <8) du^ = 4 - A 

(28.9) 

The expression in parentheses is a 2-form, called the curvature two-form: 

0、 z H- A or 6ij = dujij + w/i A u: k j. (28.10) 

With this notation, Equation (28.6) becomes 

d 2 \ = v l d 2 ti = v l ej ^>6^ = e ； 0\v l = e 0 0v, (28.11) 

where the last expression is simply an abbreviation. 





886 28. DIFFERENTIAL GEOMETRY 


connection 

coefficients 


So far, we have been dealing with the exterior derivatives of the basis vectors 
{e/}. What about the exterior derivatives of the dual basis vectors {e ; }? They are 
closely related to [dti], as the following argument shows. The first relation of 
(28.5), as well as Equation (28.8) and the fact that e J pairs up with yield 

{e/, de j } f = - (de i9 e j ) f =-( 你 0 e j ) f = -J) e j ) = 

(28.12) 

On the other hand, since is a 2-form, it can be written as 

dej = Aek = y[m em A Where rL^ -y J mk 
and 

y J hn e m a e k ) f = y J km (e f , e m A e k ) f = A e k ) 

=bH m - ^ k ) = = 4 ( 28 . 13 ) 

where we used the second equation in (28.5). Comparing (28.13) with Equation 
(28.12) yields = . We therefore have 

dc j = A e k . (28.14) 

Equations (28.8) and (28.14) show that the {w"} give all the information about 
how the bases {e*} and {e 7 } change with infinitesimal movement away from a point 
P. 


28.1*1. Box. If we can find the {u?^}, we will know the (local) geometry of 
the manifold 


From the definition of in Equation (28.8), we have • The 

functions Vijk are called the connection coefficients. Because of (28.8), these 
coefficients are antisymmetric in their first two indices. On the other hand, (28.14) 
gives 

d^ = A = -g ik Uij Ae^ = -g ik r ijm e m J 

=\g ik (Xijm - r imj )e^ A e m . (28.15) 

If coordinate frames are used for basis vectors, so that e k = dx k , the LHS of 
(28.15) will be zero and the coefficients on the RHS of Equation (28.15) must 
vanish, i.e., the connection coefficients are symmetric in their last two indices: 


^ijm = ^imj in coordinate frames. ( 28 . 16 ) 





28.2 RIEMANNIAN MANIFOLDS 887 


Riemannian and 
pseudo-Riemannian 
manifolds defined 


orthonormal frames 


structure equations, 
integ「ability 
condition, anil 
curvature matrix 


Equation (28.15) shows further that by calculating de k we merely determine the 
antisymmetric combination Tij m — Vimj. However, the antisymmetry of F^ m in its 
first two indices can be used to determine it completely. Let Cij m = Tij m — Vi m j 
be the coefficients that can be read off from (28.15). Then one can show that 

^ijm = ^jmi ~ C m ij). (28.17) 

Once the r’s are determined by this relation, they can be used to write (Oji's as a 
linear combination of the dual basis vectors. 

28.2 Riemannian Manifolds 

As mentioned before, manifolds that possess a metric are important. The general 
theory of relativity, for example, is entirely based on the existence of a metric. In 
fact, it is the job of that theory to determine the metric of 4-dimensional space-time 
from a knowledge of the distribution of matter. 

28.2.1. Definition. A Riemannian manifold is a differentiable manifold M with 
a symmetric tensor field g G 7^(M), called the metric ， such that at each point 
P e M,g\p is a positive definite inner product. A manifold with an indefinite inner 
product at each point is called a pseudo-Riemannian manifold. 

With g defined on A/, we can obtain orthonormal vectors at each point of M. 
That is, we can construct orthonormal frames {e/} such that 

e f .ej 三 g(ei,e^) = rjtj = 士 5" 

at each point P e M, 

28,2.1 Curvature via Connection 

We can find a relation betweeii the metric tensor and ojij by taking the exterior 
derivative of both sides of gij =： e* - e ； and using Equation (28.8): 

dgij = (dei) • ejei - (dej) = (e* ® - e y - + e* • (e* 0 u: k j) 

® (e* • e^) + u; k j ^ (e/ ■ e^) = u:\gkj + 4 - ojij. 

In particular, if we work in an orthonormal basis, theng/j = 士 and the LHS 
will be zero. 3 In such a case，we obtain the following antisymmetry condition for 
the 1-forms u)ij ： 

W + ^ji — 0 . 


Orthonormality is not really required. All that is necessary is for gij to be constant. 


(28.18) 




888 28. DIFFERENTIAL GEOMETRY 


We now develop an algorithm to determine the local curvature of the manifold. 
Choose an orthonormal basis and its dual and introduce the matrices 





f 0 

CJ12 

^13 … 





0 

^23 

^2m 

© = ( 公 1 ■ • • e 爪）， £ 

• 

« 

, ft = 

-CJi 3 

—^23 

0 … 





• 

• 

■ 

• 

• 

• 

• 

• 

• 

0 ) 


Ulm 


—f^3m … 


whose elements are one-forms, or vectors. Write Equations (28.8), (28.18), (28.14), 
and (28.9) as 


de = eG^2, S7 + IV = 0 ， ds — —Gil A e\ 

d 2 e = e0, where © = dSl + ft 八 (GQ). (28.19) 

The matrix G is the matrix of the metric with components g" = ±5". It is in¬ 
troduced to raise the indices when the equations are written in component form. 4 
The first two equations in (28.19) are called the structure equations, the third is 
called the integrability condition, and 0 is called the curvature matrix. 


No great mind of the past has exerted a deeper influence on the 
mathematics of the twentieth century than Georg Friedrich 
Bernhard Riemann (1826-1866), the son of a poor country 
minister in northern Germany. He studied the works of Euler 
and Legendre while he was still in secondary school, and it is 
said that he mastered Legendre’s treatise on the theory of num¬ 
bers in less than a week. But he was shy and modest, with little 
awareness of his own extraordinary abilities, so at the age of 19 
he went to the University of Gottingen with the aim of pleasing 
his father by studying theology and becoming a minister him¬ 
self. Fortunately, this worthy purpose soon stuck in his throat, 
and with his father’s willing permission he switched to mathematics. 

The presence of the legendary Gauss automatically made Gottingen the center of the 
mathematical world. But Gauss was remote and unapproachable~particularly to beginning 
students — and after only a year Riemann left this unsatisfying environment and went to the 
University of Berlin. There he attracted the friendly interest of Dirichlet and Jacobi, and 
learned a great deal from both men. Two years later he returned to Gottingen, where he 
obtained his doctor’s degree in 1851. During the next 8 years, despite debilitating poverty, 
he created his greatest works. In 1854 he was appointed Privatdozent (unpaid lecturer), 
which at that time was the necessary first step on the academic ladder. Gauss died in 1855, 
and Dirichlet was called to Gottingen as his successor. Dirichlet helped Riemann in every 
way he could, first with a small salary (about one-tenth of that paid to a full professor) and 
then with a promotion to an assistant professorship. In 1859 he also died, and Riemann was 



4 Strictly speaking, we should use G -1 instead of G. But since gij = ±5 ； j , the two are identical. 





28.2 RIEMANNIAW MAWIFOLDS 889 


Bianchi identity 


Riemann curvature 
tensor 


appointed as a full professor to replace him. Riemann’s years of poverty were over, but his 
health was broken. At the age of 39 he died of tuberculosis in Italy, on the last of several 
trips he undertook in order to escape the cold, wet climate of northern Germany. Riemann 
had a short life and published comparatively little, but his works permanently altered the 
course of mathematics in analysis, geometry, and number theory. 

It is said that the three greatest mathematicians of modem times are Euler, Gauss, and 
Riemann. It is a curiosity of nature that these three names are among the most frequently 
mentioned names in the physics literature as well. Aside from the indirect use of his name in 
the application of complex analysis in physics, Rlemannian geometry has become the most 
essential building block of all theories of fundamental interactions, starting with gravity, 
which Einstein fomiulaled in this language in 1916. As part of the requirement to become 
a Privatdozent, Riemann had to write a probationary essay and to present a trial lecture 
to the faculty. It was the custom for the candidate to offer three titles, and the head of his 
department usually accepted the first. However, Riemann rashly listed as his third topic 
the foundations of geometry. Gauss, who had been turning this subject over in his mind 
for 60 years, was naturally curious to see how this particular candidate’s “gloriously fertile 
originality” would cope with such a challenge, and to Riemann’s dismay he designated this 
as the subject of the lecture. Riemann quickly tore himself away from his other interests at 
the time — “my investigations of the connection between electricity, magnetism, light, and 
gravitation” 一 and wrote his lecture in the next two months. The result was one of the great 
classical masterpieces of mathematics, and probably the most important scientific lecture 
ever given. It is recorded that even Gauss was surprised and enthusiastic. 


We can derive further integrability conditions. For instance, applying d to the 
third equation of (28.19) gives 

0 = d^s = — 八 e + Gfi Ade = —Gdfl As — (Gfi) A (GQ) A s. 
Multiplying both sides on the left by G = G -1 , we obtain 

0 = — [d^l + 12 A (Gf2)] A s = 一 0 A £. (28.20) 

Similarly, the reader may show that 

d® = ^A (G0) -0 A (Gft). (28.21) 

This is called the Bianchi identity. 

The ijth element of the matrix of 0 can be written as 

Oij = \Rijki^ A e l , (28.22) 

which defines the components Rijki of the Riemann curvature tensor. The an¬ 
tisymmetry of the matrix © (showing this is left as a problem for the reader) and 
Equation (28.22) give 


Rijki + Rjiki = 0, Rijki + Rijik - 0. 


(28.23) 



890 28. DIFFERENTIAL GEOMETRY 


Square brackets 
mean 

antisymmetrization. 


Christoffel symbol 


connection between 
infinitesimal 
displacement, arc 
length, and metric 
tensor 


Similarly, the relation 0ae — 0 of Equation (28.20) can be shown to be equivalent 
to 


Rijki + Rikij + Rujk = 0 and ^i[jki] = 0, (28.24) 

where 


28.2.2. Box. The enclosure of indices in square brackets means complete 
antisymmetrization of those indices. 


When coordinate bases are used, u ；/； is no longer antisymmetric, but we still 
have 


dgij = Uij -f Uji = Tijkdx k + Tjikdx k . 

Since dgij = (dgij/Sx k )dx k , we get 

8ij,k 三磬 =、+ r Jik - 


Using Equations (28.16) and (28.25), we can readily show that 


r ijk = ^(gij,k + gik,j - gkj,i)= 


1 /^8ij ^8ik ^8kj\ 


(28.25) 


(28.26) 


This is the Christoffel symbol used in classical tensor analysis. Now we consider 
the connection between an infinitesimal displacement and a metric. Let F be a point 
of M. Let y be a curve through P such that y(c) = P. For an infinitesimal number 
Su, let P f = y(c + be a point on y close to P. Since thex 1 are well-behaved 
functions, (P) are infinitesimal real numbers. Let = x l {P f )-x l (P), 

and construct the vector y = ^ l di 9 where {3i} consists of tangent vectors at P. We 
call v the infinitesimal displacement at 尸 . The length of this vector, g(v, v), is 
shown to be This is called the arc length from P to P ’， and is naturally 

written as ds 2 = g" 专 , f 人 It is customary to write dx l (not a 1-form!) in place of 

r ： 

ds 2 =： gijdx l dx\ (28.27) 


where the dx l are infinitesimal real numbers. 


Elwin Bruno Christoffel (1829-1900) came from a family in the cloth trade. He attended 
an elementary school in Montjoie (which was renamed Monschau in 1918) but then spent a 
number of years being tutored at home in languages, mathematics, and classics. He attended 
secondary schools from 1844 until 1849. At first he studied at the Jesuit gymnasium in 




28,2 RIEMANNIAN MANIFOLDS 891 


Cologne but moved to the Friedrich-Wilhelms Gymnasium in the same town for at least the 
three final years of his school education. He was awarded the final school certificate with a 


distinction in 1849. The next year he went to the University of Berlin and studied under a 
number of distinguished mathematicians, including Dirichlet 

After one year of military service in the Guards Artillery Brigade, he returned to Berlin 
to study for his doctorate, which was awarded in 1856 with a dissertation on the motion of 
electricity in homogeneous bodies. His examiners included mathematicians and physicists, 
Kummer being one of the mathematics examiners. 

At this point Christoffel spent three years outside the aca¬ 
demic world. He returned to Montjoie, where his mother was 
in poor health, but read widely from the works of Dirichlet, 

Riemann, and Cauchy. It has been suggested that this period of 
academic isolation had a major effect on his personality and on 
his independent approach towards mathematics. It was during 
this time that he published his first two papers on numerical in¬ 
tegration, in 1858, in which he generalized Gauss’s method of 
quadrature and expressed the polynomials that are involved as 
a determinant. This is now called Christoffel’s theorem. 

In 1859 Chiistoffel took the qualifying examination to be¬ 
come a university teacher and was appointed a lecturer at the University of Berlin. Four 
years later, he was appointed to a chair at the Polytechnicum in Zurich, filling the post 
left vacant when Dedekind went to Brunswick. Christoffel was to have a huge influence on 
mathematics at the Polytechnicum, setting up an institute for mathematics and the natural 
sciences there. 

In 1868 Christoffel was offered the chair of mathematics at the Gewerbsakademie in 
Berlin, which is now the University of Technology of Berlin. However, after three years at 
the Gewerbsakademie in Berlin, Christoffel moved to the University of Strasbourg as the 
chair of mathematics, a post he held until he was forced to retire due to ill health in 1892. 

Some of Christoffel’s early work was on conformal mappings of a simply connected 
region bounded by polygons onto a circle. He also wrote important papers that contributed 
to the development of the tensor calculus of Gregorio Ricci-Curbastro and Tullio Levi-Civita. 
The Christoffel symbols that he introduced are fundamental in the study of tensor analysis. 
The Christoffel reduction theorem, so named by Klein, solves the local equivalence problem 
for two quadratic differential forms. The procedure Christoffel employed in his solution 
of the equivalence problem is what Ricci later called covariant differentiation; Christoffel 
also used the latter concept to define the basic Riemann - Christoffel curvature tensor. His 
approach allowed Ricci and Levi-Civita to develop a coordinate-free differential calculus 
which Einstein, with the help of Grossmann, turned into the tensor analysis, the mathematical 
foundation of general relativity. 




In applications, it is common to start with the metric tensor g given in terms 
of coordinate differential forms: 

g — gijdx 1 ig) dxK where gij = gji == g(3/, dj), (28.28) 

Then the orthonormal bases {e/} and {€*} are constructed in terms of {3, } and 
[dx j } 9 respectively, and are utilized as illustrated in the following examples. The 






892 28. DIFFERENTIAL GEOMETRY 


Friedmann metric 


Schwarzschild metric 


equivalence of the arc length [Equation (28.27)] and the metric [Equation (28.28)] 
is the reason why it is the arc length that is given in most practical problems. 
Once the arc length is known, the metric g" can be read off, and all the relevant 
geometric quantities can be calculated from it. 

28.2.3. Example. Let us look at a few examples of arc lengths and the corresponding 
metrics. 

(a) For ds 2 = dx 2 + dy 2 + dz 2 , g is the Euclidean metric of R 3 , with gfj = S(j . 

(b) For ds 2 = —dx 1 - dy 2 — dz 2 + dt 2 , g is the Minkowski (or Lorentz) metric of R 4 , 
withg" = where r) xx = rj yy = rj zz = -rjtt = -1 and r}ij = 0for i ^ j. 

(c) For ds 2 = dr 2 -|- r 2 (d0 2 + sin 2 Odcp 2 ), the metric is the Euclidean metric given in 
spherical coordinates in E 3 with g rr = h gQQ = r\ = r 2 sin 2 0， and all other 
components zero. 

(d) For ds 2 = a 2 d0 2 -h a 2, sin 2 Qd<p\ the metric is that of a two-dimensional spherical 
surface, with gge = a 2 , g 卿 =a 2 sin 2 and all other components zero. 

(e) For 

ds 2 = dt 2 — a 2 (t)[dx 2 + sin 2 x( 洲 2 + sin 2 0 d(p 2 )], 

the metric is the Friedmann metric used in cosmology. Here g tt = 1, g xx = - [a(t )] 2 9 
ggg = — [«(01 2 sin 2 x ， S<p<p = — [aOO] 2 sin 2 x sin 2 0, and all other components are zero. 

(f) For 

ds 2 = (l — ~~~) ^ ~ 厂 2 ( 洲 2 + sin 2 0 d<p 2 ), 

the metric is the Schwarzschild metric with g tt = \ — 2M/r 9 g rr = —(1 — 2Af/r) _1 , 
gO 0 = —r 2 , = —r 2 sin 2 沒 ， and all other components zero. 

For each of the arc lengths above, we have an orthonormal basis of one-forms: 

(a) g = e 1 ® e 1 + e 2 ® e 2 + e 3 e 3 with e 1 = dx, e 2 = dy 9 c 3 = dz\ 

(b) g = —c 1 ® e 1 — e 2 0 6 2 — e 3 ® e 3 + e° 0 e° with e 1 = dx, e 2 = dy, € 3 = dz, 

= dt; 

⑹ g = e 7 " (8) € r + e 0 (8> e 0 H- (g) with e r = dr, — rdO, ^ —r smOdq>\ 

(d) g = e 0 (8> + e 91 (gi with e & = ad0 9 = asin Gdq>; 

(e) g = 0 e f — igi — e 沒⑧ ^ — e 炉 0 €炉 with = dt t = a(t)dx> ^ — 

a(t) smxdO t = a(t) smx smOdip; 

(f) g = ③ — e r ⑭ e r — e 沒 (?) e 0 - e 沪 (g) e 炉 with e l = (1 — 2 M/r)^^dt, e r = 

(1- IMjrT^dr, e 6 = rdO, ^ =rsmOd<p. ■ 

28.2.4. Example* In this example, we examine the curvilinear coordinates used in vector 
analysis. Recall that in terns of these coordinates the displacement is given by ds 2, = 
h^(dq\)^ + 办 2 ( 却 2) 2 + 办 !( 扣 3) 2 , Therefore, the orthonormal one-forms are e 1 = h\dq\, 
e 2 = h 2 dq 2 , c 3 = h^dq^. We also note (see Problem 26.24) that 

d*df = ( + + dx A dy Adz = V 1 fdx Ady A dz. (28.29) 

V dx 2 3y 2 Bz 2 / 


28.2 RIEMANNIAN MANIFOLDS 893 


We use this equation to find the Laplacian in terms of q \， 尽 2 , qy. 

df 




a/ ^ 2 + ^3 


a 扔 


a/ 


h-2 3^2 


办 3 3 孕 3 / 


where we substituted orthonormal 1-forms so we could apply the Hodge star operator. It 
follows now that 


*df 


丄！ 

h\ dq\ 

1 a/ 


* € 


9 / 


氺 e 


2 


3 / 


^2 ^^3 9< ?3 

3 / 


氺 e ' 


vai a,i ) e2Ae3 + fe^) c3Acl 


9 / 


e 1 A 


dq 3 

h 2 a q2 J d92Adqi + (^^；) dqiAdq2 ' 

Differentiating once more, we get 

3 /! rr^) dqiAdq2 Arf93 + i ( 雙為—邮 

^ dqj Adq\ A d 


d *df 


dqi V h x 3qi 

, 3 /hih 2 df 


9^3 V /13 dq^ 


m 


{ 1 

r 9 1 

9 / V ( a / 

1/11/12/13 

L 扣 1 ! 

v h\ dq\f dq2 ^ 


h x h^ df 


(28.30) 


9^3 \ 9^3 


)]H 


A e 2 A e 3 . 


Since {e 1 , e 2 , e 3 } are orthonomial one-forms (as are {dx, dy, dz}\ the volume elements 
c 1 A e 2 A e 3 and dx A dy A dz are equal. Thus, we substitute the latter for the former in 
(28.30), compare with (28.29), and conclude that 


V 2 / 


1 


办1办2办3 


9 + _A_ ( h \ h 3 \ + J_ /^2 

12 谷 Q2) 3 《 3 、办 3 


-dq\ \ h\ Bq\/ dq 〗 


a 釣 


which is the result obtained in curvilinear vector analysis. 


28.2.5. Example. Let M = M 2 , and suppose that the arc length is given by ds 2 
(dx 2 + dy^)/y^. We can write the metric as g = e 1 ③ e 1 + e 2 ® e 2 if we define 




dx 

y 


— and 


y 


The dual vectors {e 1 ， e 2 } are orthonormal one-forms and G = 1, so we need not worry 
about raising and lowering indices. Inspection of the definition of e 1 and e 2 , along with the 
fact that dx(B x ) = dy(dy) = 1 and dx(By) = dy(d x ) = 0, immediately gives ej = yd x 
and e 2 = 3^3y. 

To find the curvature tensor, we take the exterior derivative of the € l ’s: 


de ] 


11 

d (—办 )= —— ^dy A = e 1 A e 2 , 


de z = 0 . 


(28.31) 




894 28. DIFFERENTIAL GEOMETRY 


From these equations，the antisymmetry of the cj’s, and Equation (28.14), we can read off 
coij. They are oj\\ = W 22 = 0 and ^12 = -^21 = —e 1 - Thus, the matrix ft is 

M 二 7) = 0' ~o> 

which gives 

o _/0 —de^\ _ / 0 —e 1 A e 2 \ 

~ Ue 1 0 J _ Uac 2 0 J ， 


and 


編 C - 0 e >C! "0 

Therefore, the curvature matrix is 

0 = </n = (elAe2 "'VO 


0. 


^11 ^12 
及 21 设 22 


)• 


This shows that the only nonzero independent component of the Riemann curvature tensor 
is ^1212 — — 1 - 趣 

28.2.6. Example. Fora spherical surface of radius a, the element of length is 

ds 2 = a 2 dO 2 + a 2 sin 2 6d<p 2 . 

The orthononnal foims are = adO and = a sin Od<p t and we have 
G = 1, de Q = 0, de^ =acos0d0 Ad(p = - cotOe^ A 


a 


The matrix H can now be read off: 5 

cot 沒 

I 0 一 

n 


0 

cotO 






a 

0 


a 


A straightforward exterior differentiation yields 


i 


0 




e 6 A 0 


9 


/ 


1 ( 0 e 9 A 

? \-e G A 6^ 0 


Similarly, ft A ft = 0. Therefore, the curvature matrix is 
1 ( 0 e d a €^' 


© = dfl 


— A€^\ 

^2 \-e 6 a e 供 0 ) 


The only independent component of the Riemann curvature tensor is Rq* 
is constant, as expected fora spherical surface. 


1/fl 2 , which 


28.2 RIEMANNIAN MANIFOLDS 


What is a flat 
manifold? 


It is clear that when the gij in the expression for the line element are all constants 
for all points in the manifold, then will be proportional to dx l and de j = 0, for all 
i ， This immediately tells us that = 0, and therefore 0 = 0; that is, the manifold 
has no curvature. We call such a manifold flat. Thus, for ds 2 = dx 2 -\- dy 2 -\- dz 2 , 
the space is flat. However, arc lengths of a flat space come in various guises with 
nontrivial coefficients. Does the curvature matrix 0 recognize the flat arc length, 
or is it possible to fool it into believing that it is privileged with a curvature when in 
reality the curvature is still zero? The following example shows that the curvature 
matrix can detect flatness no matter how disguised the line element is! 

28.2.7. Example* In spherical coordinates, the line element {arc length) of the flat Eu¬ 
clidean space E 3 is ds 2 = dr 2 + r 2 d0 2 + r 1 sin 2 0d(p 2 . To calculate the curvature matrix, 
we first need an orlhonormal set of one-forms. These are immediately obtained from the 
expression above: 


= rdO, e 9 =r sm 0d(p 9 


G = 1. 


Taking the exterior derivatives of these one-forms, we obtain 
de r = d^r = 0, 

de & = dirdd) = dr A dG + r d 2 G =e r A ( —^ = -e r A , 

\r J r 

de^ =d(r sin0) 八 dtp = sin Qdr Ad<p-\-r cos OdO A d<p 


sm0e r A ( — —-) +rcos^ ( — 
\rsm6/ \ r 

1 *■ tn cot 0 q 
- e r A + —— Ae^. 


r sin 沒 


We can now use Equation (28.14) to find the matrix of one-forms In calculating the 
elements of Q, we remember that it is a skew-symmetric matrix, so all diagonal elements 
are zero. We also note that = 0 does tiot imply that = 0. Keeping these facts in 
mind, we can easily obtain U (the calculation is left as a problem for the reader): 


—— € 


cotO 


e 叫 . 


The exterior derivative of this matrix is found to be 
/ - . cot 沒 


e e A 


e d a 




5 Note that de 9 = 0 does not imply co \2 = 0. 


28. DIFFERENTIAL GEOMETRY 


which is precisely (the negative of) the exterior product 12 A n, as the reader may wish to 
verify. Thus, 0 = dil + O a O = 0, and the space is indeed fiat! 圜 

In all the foregoing examples, the curvature was calculated intrinsically. We 
never had to leave the space and go to a higher dimension to “see” the curvature. 
For example, in the case of the sphere, the only information we had was the line 
element in terms of the coordinates on the sphere. We never had to resort to any 
three-dimensional analysis to discover a globe embedded in the Euclidean M 3 . As 
mentioned earlier, if a space has line elements with constant g”, then the Riemann 
curvature vanishes trivially. We have also seen examples in which the components 
of a metric tensor were by no means trivial, but 0 was smart enough to detect the 
flatness in disguise. Under what conditions can we choose coordinate systems in 
terms of which the line elements have gij = 土 5"? To answer this question we 
need the following lemma (proved in [Flan 89, pp. 135-136]): 

28.2»$. Lemma. If SI is a matrix of l-forms such that d^l H- H A (GO) =0, then 
there exists an orthogonal matrix A such that dA = AGfi. 

The question raised above is intimately related to the connection between coor¬ 
dinate and orthonormal frames. We have seen the usefulness of both. Coordinate 
frames, due to the existence of the related coordinate functions ， are useful for 
many analytical calculations, for example in Hamiltonian dynamics. Orthonormal 
frames are useful because of the simplicity of expressions inherent in all orthonor¬ 
mal vectors. Furthermore, we saw how curvature was easily calculated once we 
constructed orthonormal dual frames. Naturally, we would like to have both. Is 
it possible to construct frames that are both coordinate and orthonormal? The 
following theorem answers this question: 

28.2.9. Theorem. Let M be a Riemannian manifold. Then M is flat, i.e” 0 = 0 
if and only if there exists a local coordinate system {x l } for which {9/} is an 
orthonormal basis. 

^ * 

Proof. The existence of orthonormal coordinate frames implies that {dx 1 } are 

orthonormal. Thus, we can use them to find the curvature. But since d(dx l ) = 0 
for all f ， it follows from Equation (28.12) that u; 1 = 0 and n = 0. So the curvature 
must vanish. Conversely, suppose that 0 = 0. Then by Lemma 28.2.8, there exists 
an orthogonal matrix A such that dA = AGfi. Now we define the one-form column 
matrix rby r = Ae, where £ is the one-form column matrix of Equation (28.14). 
Then, using (28.19), we have 

dr = d(Ae) = dA A e Ade = (AGf2) A £ — A(Gfi A e) == 0. 

Thus, dr 1 = 0 for all i. By Theorem 26.5.14 there must exist zero-forms (func¬ 
tions) x l such that r l — dx 、 These x l are the coordinates we are after. The basis 
{3i} is obtained using the inverse of A (see the discussion following Proposition 
25.1.1). Since A is orthogonal, both {dx 1 } and {3,*} are orthonormal bases. □ 





_28.3 COVARIANT DERIVATIVE AND GEODESICS 

28.3 Covariant Derivative and Geodesics 

The essence of all geometries are straight lines. The familiar Euclidean geometry 
is developed entirely based on a number of postulates concerning certain attributes 
and properties for straight lines. From the physical standpoint, straight lines are 
those trajectories on which “free” particles_including light~travel. If geometry 
is the basis of physical theories (general theory of relativity, and, to a lesser degree, 
electromagnetism and the nuclear interactions), knowledge of straight lines will 
be crucial. 

28.3.1 Covariant Derivative 

Straight lines are characterized by the “least amount of bending.” In flat space, this 
involves zero bending; but if space is curved, the bending cannot be eliminated. 
The bending of space is gauged by a test vector as it moves (infinitesimally) along 
some trajectory. The infinitesimal character of any trajectory is encapsulated in the 
vector tangent to it at the point of interest. Thus the concept of straight line is tied 
to the way one vector (the test vector) changes along a second vector (the tangent 
vector). 

covariant derivative 28.3.1. Definition. Let u, v e X(M) be vector fields on M. The covariant deriva- 
of a vector five of\ with respect to (or along) u, denoted by V u v, is defined as 

V u v= ^v, u). (28.32) 

So, V u : X(M) X(M), i.e. t V u maps vector fields to vector fields. 

Certain properties of the covariant derivative follow from this definition. In 
most books on differential geometry and relativity, these properties are used to de¬ 
fine the covariant derivative. We collect these properties in the following theorem. 

28.3.2. Theorem. Let u, v, and w be vector fields，and f and h functions on a 
Properties of manifold M. Then the covariant derivative has the following properties: 
covariant derivative 

L V u v — V v u = [u, v]. 

2. V u (/v) = u(/)v + /V 11 v. 

3. V u (v+w) = V u v + V u w. 

4. V /U+Aw (v) = /V u v-h /iV w v. 〜 

Proof. We shall prove the first property, which happens to be the hardest. The rest 
are easy consequences of the definition and the linearity of pairing. To prove the 
first property, choose a basis 6 {e^} and its dual {e 7 } and write v = u = 

6 Eventually, we shall take the basis to be a coordinate frame. But for notational convenience, we first work in a general basis. 



DIFFERENTIAL GEOMETRY 


and 

v m ■ « 
dv = dv l ei + v l dei ， du = du l ti + u l de“ 

Then we have 

{d\, u> == {dv\ u) ti + v l {dei , u) = [u(v*)]ei + v l u j {dei.ej ), 

{dn, v) = {du l ,\) e,- + m* {^, v) = [v(m *)] 均 + u l v j (Je,-,ej), 

and 

(d\, u) - (du, v) = [u(v l ) — v<y )]e，+ v l u j ({rfe,*,e 7 > - (de y -,e/)). 

(28.33) 

Let us evaluate the term in parentheses. Using Equation (28.8), we have 
(de/,ey) - (dej f ei) = (e* 0 a?*, ej) - {e k <g) 

=, ej> - {^ k j ， e/ 〉 = 你 （ r ~ •一 r. 

The basis chosen above was general. However, we are free to choose any convenient 
basis to prove a vector or tensor identity. If we choose the basis to be a coordinate 
frame, then by Equation (28.16), — r^. = 0. Furthermore, 

[n(v l ) — \(u l )]ej = [u, v] in a coordinate frame. 

Substituting these two relations in Equation (28.33), we obtain the first part of the 
theorem. □ 


Tbltio Levl-Civita (1873-1941), the son of Giacomo Levi- 
Civita, a lawyer who from 1908 was a senator, was an out¬ 
standing student at the liceo in Padua. In 1890 he enrolled in the 
Faculty of Mathematics of the University of Padua. Giuseppe 
Veronese and Gregorio RIcci-Curbastro were among his teachers. 
He received his diploma in 1894 and in 1895 became resident 
professor at the teachers’ college annexed to the Faculty of Sci¬ 
ence at Pavia. From 1897 to 1918 Levi-Civita taught rational 
mechanics at the University of Padua. His years in Padua (where 
in 1914 he married a pupil, Libera Trevisani) were scientifically 



the most fruitful of his career. In 1918 he became professor of higher analysis at Rome and, 


in 1920, of rational mechanics. In 1938, struck by the fascist racial laws against Jews, he 


was forced to give up teaching. 


The breadth of his scientific interests, his scruples regarding the fulfillment of his 


academic responsibilities, and his affection for young people made Levi-Civita the leader 
of a flourishing school of mathematicians. 



28.3 COVARIANT DERIVATIVE AND GEODESICS 899 


Levi-Civita’s approximately 200 memoirs in pure and applied mathematics deal with 
analytical mechanics, celestial mechanics, hydrodynamics, elasticity, electromagnetism, 
and atomic physics. His most important contribution to science was rooted in the memoir 
“Sulle trasformazioni delle equazioni dinamiche” (1896), which was characterized by the 
use of the methods of absolute differential calculus that Ricci had applied only to differen¬ 
tial geometry. In the **Methodes de calcul differentiel absolus et leurs applications，’’ written 
with Ricci and published in 1900 m Mathematische Annalen ，there is a complete exposition 
of the new calculus, which consists of a particular algorithm designed to express geomet¬ 
ric and physical laws in Euclidean and non-Euclidean spaces, particularly in Riemannian 
curved spaces. The memoir concerns a very general but laborious type of calculus that 
made it possible to deal with many difficult problems ， including, according to Einstein, the 
formulation of the general theory of relativity. 

Although Levi-Civita had expressed certain reservations concerning relativity in the 
first years after its formulation (1905), he gradually came to accept the new views. His 
own original research culminated in 1917 in the introduction of the concept of parallel 
transport in curved spaces. With this new concept, absolute differential calculus, having 
absorbed other techniques，became tensor calculus, now the essential instrument of the 
unitary relativistic theories of gravitation and electromagnetism. 

In his memoirs of 1903-1916 Levi-Civita contributed to celestial mechanics in the study 
of the three-body problem: the detennination of the motion of three bodies，considered as 
reduced to their centers of mass and subject to mutual Newtonian attraction. In 1914-1916 
he succeeded in eliminating the singularities present at the points of possible collisions, past 
or future. His research in relativity led Levi-Civita to mathematical problems suggested by 
atomic physics, which in the 1920s was developing outside the traditional framework: the 
general theory of adiabatic invariants, the motion of a body of variable mass，the extension 
of the Maxwellian distribution to a system of corpuscles, and the Schrodinger equation. 


The covariant differentiation of vectors can be extended to arbitrary tensors 
once it is defined for 1-forms. For this, we make the extra assumption that V u can 
be “pushed” inside a pairing, and that when acting on a tensor product, it obeys 
the product rule of differentiation ， i.e” V u acts as a derivation on the algebra of 
tensors. Let a; be a 1-form and v a vector. Then 

V u (w, v) = <V uW ， v> + 〈叫 V u v> (28.34) 

defines the covariant derivative of a;. Using Equation (28.34) the reader may readily 
show the primary covariant derivative relation: 

V = -T j ki e k , (28.35) 


Since an arbitrary tensor of a given kind can be expressed as a linear combination 
of tensor product of vectors and 1-forms, our knowledge of the action of V u 
on functions (coefficients of expansion), vectors, and 1-forms, plus the assumed 
derivation property of V u , is enough to uniquely define the action of V u on any 
tensor. Equation (28.34) shows that the covariant derivative of a 1-fonn withrespect 








900 28. DIFFERENTIAL GEOMETRY 


to a vector is another 1-form, because when it pairs up with a vector it gives a 
number. We have already pointed out that the covariant derivative of a vector with 
respect to another vector is a third vector. We therefore conclude that 


28.3.3. Box. The covariant derivative of a tensor of a given kind is another 
tensor of the same kind. 


28*3A Example. Using the definition of the covariant derivative, the reader may check 
that 


V ej ey = Note the change in order of indices! (28.36) 

Now consider two bases {e^} and {e；/}. Write the primed basis in terms of the other: = 
Then 


^R l jf [R j if V ei (ej) + W ei {R j if )ej} 

=R 1 〆〆 、 + R l jf e! (4) e w . 

— I ? 成 


Connection 
coefficients are not 
tensors! 


Writing on the LHS, equating the components on both sides, and multiplying 

both sides by the inverse of the transformation matrix R, we obtain 

r^, = R k ^R l r R j v T^ R k ^R l yR m Vtl , (28.37) 

' - v - ♦ 、 - , - J 

howa (1,2)-tensor nontensorial 

transforms teral 


where R k ^ = (R _1 )^ m . Equation (28.37) shows that the connection coefficients are not 
tensors. 匾 


28.3.5. Example. Equation (28.36) connects with the structure constants of the Lie 

algebra of vector fields on a manifold. To see this connection, use the first property of the 
covariant derivative in Theorem 28.3.2 and Equation (28.36) to obtain 

[e^e 7 ] = V e/ ej - V Cj ej = (r*.■ - r^-). 

It follows from this equation that 

4 =r 〜— r V (2838) 


In particular, in a coordinate frame, c k t j = 0 and the connection coefficients are symmetric 
in their lower indices, a result we obtained earlier. 囫 




28.3 COVARIANT DERIVATIVE AND GEODESICS 


gradient operator for 
tensors 


Let The a tensor of type (r ， s) • It is convenient to think of V U T as the contraction 
of another tensor S = VT with u. Then V may naturally be treated as a linear 
operator that maps T 7 / (M) ， the bundle of tensor fields of type (r, 5 1 ), to 7J+ i (Af) ， the 
bundle of tensor fields of type (r, j +1). One then writes V : 7/ (M) T^ +l (M) 
and calls V the generalized gradient operator. If 


O 、 ⑭…㈣ r ® e) 1 ® … 0 € 厶， 


then 


VT= d 内 0 …㈣ r 0 € J1 … (8> €厶 0 e 、 

and, with u = u k ek, 

V U T = d〆 ％ <8> • ■ ■ ® ef r 0 € il <g) … ® 


(28.39) 


(28.40) 


Using these relations, we can calculate the components of the covariant derivative 
of a general tensor. It is clear that if we use e* instead of u, we obtain the 女 th 
component of the covariant derivative. So, on the one hand, we have 


v ej= ® … @ ® ••• ® e 厶， 

and on the other hand, 

V efe T = V et (d:e“ 0 * * * 0 e^ r (gf e 71 0 • • • <S) e js ) 

=00 € J1 0 …⑭€厶 
r 

m=l \ 

C|,r imk 
s 

+ 0 . • * 0 C!f r ig) e 71 ® ■ (g> V efc e^ • • • <S> € Js ， 

S m=l ' v ^♦ 

■ -「V 

by (28.35) 


(28.41) 


(28.42) 


• • • * 

where T 1 ^' ^ k =^k Equating the components of Equations (28.41) and 

(28.42) yields … J 

r 

…I ， _ /rrlj X 、 了衣1 … ’WZ—l 如 ffl +1 …茗 ， 

*/l … i/p ; 灸 jl … js 打 k 


a 

— X 、 1 … 在 r p/l 

—… jm—l~ w +l … 厶 1 jmk ， 


(28.43) 


where only the sum over the subindex m has been explicitly displayed; the (hidden) 
sum over repeated indices is, as always, understood. 


902 28. DIFFERENTIAL GEOMETRY 


There is a useful relation between the covariant and the Lie derivative that we 
derive now. First, let T be of type (2,0) and write it in some frame asT = 

Apply the covariant derivative with respect to u to both sides to obtain 

V U T = n{T ij )ei^ej^-T ij (V u e/) 0 ej + T ij ei 0 (V^). 


derivation of the 
relation between the 
Lie and the covariant 
derivatives 


Similarly, 

L U T = n(T ij )ei 0 e ； + T ij (L u ei) ® ej + T ij ei 0 (L u ej ). 

Now use L n ej = [u, ej] — V u ej — V e> u to get 

L U T = V e T - T iJ [(V ei -u)®e ; +e/® (V ej u)]. (28.44) 

On the other hand, if we apply V n andZ u to both sides of 5) = (e、ej and 
use [u, e*] = V u ei - V ei u, we obtain 

V u e f = L ll e / - (V^u) 1 e j . 

It follows that for T = ® e^ 9 we have 

L U T = V u T+7； ; [(Ve.u) 1 ' e k 0 + ((g) e” • （ 28.45) 

One can use Equations (28.44) and (28.45) to generalize to a tensor of type (r, s). 
If u happens to be tangent to a curve t y(t). Equation (28.40) is written as 


V U T= — (g> … ® e。0 ^ <g) … 0 e)% (28.46) 

at 

where DTj^ l j /dt = d.〆. In a coordinate frame, with u l = x 1 = dx l /dt. 
Equations J (28io) and (^8.43) give 


dt 


rjil J 


dx^ 




I jrll-.J/- r , . , . 

dt / ^ jl … jm-l jm 


； dx k 
_ 

厶 nk ~dt 


dx k 


… ， • 十 r _ _ 

jl … (ft 


m: 


For the case of a vector (28.47) becomes 



(28.47) 


(28.48) 


This is an important equation, to which we shall return shortly. 

With the generalized gradient operator defined，we can construct the divergence 
of a tensor just as in vector analysis. Given a vector, the divergence operator V- 
acts on it and gives a scalar, or, in the language of tensor analysis, it lowers the 
upper indices by 1. This takes place by differentiating components and contracting 
the upper index with the newly introduced index of differentiation. The divergence 
of an arbitrary tensor is defined in precisely the same way: 




28.3 COVARIANT DERIVATIVE AMD GEODESICS 903 


divergence of a 
tensor field 


28.3.6. Definition. Given a tensor field T, define its divergence V • J to be the 
tensor obtained from VT by contracting the last upper index with the covariant 
derivative index. In components, 


■*r—] 




rwi^\ 

1 jl."js\k 


The covariant derivative replaces the ordinary derivative when functions are 
generalized to tensor fields，and in many respects it is very similar to ordinary 
derivatives. The aspect of the covariant derivative that is in contrast to the ordinary 
derivative, namely, its lack of commutativity, is related to an important geometrical 
object that we encountered before: curvature. Let us find this relation. Start with 
an ordinary 1-form oj — w/e% and note that 

u A v) = {d(Oi A e £ 4- coide 1 , u A v) 


« » 

= {da>i A e 1 , u A v> - {- {coide 1 , u A v), 


(28.49) 


where u and v are arbitrary vectors. For the first term of Equation (28.49), we 
obtain 


{do)i A e*, u A v> = {dcoi, u) v l — (da)i,\) u l — v*u (叫 0 — w f v(cw/). 


Now we use 

u((v, u}) = u(i/ 叫） = COiU(v l ) + 

• • * 

v({u, u)) = y{u l coi) = + u l \(coi), 


to get 

{dcoi Ae l ,u A v) = u((v, a?)) — v({u, w)) +(Oi[\(u l ) — u(i/)]. 


(28.50) 


With = 一八 e" = —V l j k e k A e 7 , the second term of (28.49) becomes 

(coide 1 ,u A v) = {e k A e 7 , u A v) = — (u k v 】 —u J v k ) 

=-0Hii k v j (r% - r^y) = -u k v j (w, [ejt, ej]). 

V ■■■ _ V ■ 一 

<€ f ， [e fc ， ey]) 


We also note that 

[u ， v] = [u k tk, v j ej] =： u k ek(v j ej) - 

= [n(v j )]ej + u k v j ekej — [v(z/)]e* — u k v^jtk 
.= [n(v J ) - \(u j )]ej + u k v j [ek, ej]. 



904 28. DIFFERENTIAL GEOMETRY 


Riemann curvature 
“tensor”is indeed a 
tensor in the sense of 
Box 26.4.11! 


It follows that 

{lj, [u, v]) = coj[u(v^) - v(w ; )] + u k v j (a;, [e^,ey]>, (28.51) 

and the second term of (28.49) becomes 

(coide l ,n 八 v> = — (o? ， [u, v]> + coj[u(v^) — v(w 7 )]. (28.52) 

Combining Equations (28.49) ，（ 28.50)，and (28.52), we obtain 

{du, u A v) = u((v, cj» — v({u, u;» — (w, [u, v]). (28.53) 

Equation (28.53) is the basis of our generalization involving the covariant 
derivative: We replace lj with a vector-valued 1-form T, and the derivatives u(. • •) 

(28.54) 


(28.55) 


and v(* * •) with (…） and V v (- - *)- The result will be 

{dT, UAV) = V u ({v, T))- V v ((u, T»-(T, [u,v]). 
Choose a vector w and replace T with dwto obtain 

{d 2 w 7 u Av> ~ V u ((v, dw)) -V v ({u, dvf)) - (rfw, [u, v]> 

' - v - . ' - - - ; 、 - V - ^ 

^VyW =V U W =V[ U>V ]W 

=V u V v w — V v V u w - V [U>V ]W = R(u, v)w, 


where we have introduced the linear operator R(u, v) : X(M) —► X(M) in the last 
line. We rewrite the definition of this operator as 

ft(u, v) = V u V v - V v V u — V [u ， v] . (28.56) 


Equations (28.55) and (28.56) can be used to derive the following identity (see 


Problem 28.11): 

R(u, v)w + R(w ， u)v + R(v ， w)u = 0. (28.57) 

Define the map R : X*(M) x X(M) x X(M) x X(M) R by 

R(u?, w, u, v) = (w, R(u, v)w). (28.58) 

The covariant derivatives in the definition of R(u ， v) may give the impression that 
R will differentiate whatever appears to its right. It is a remarkable property of 
R that this will not happen: That R is a tensor of type (1, 3) [see Box 26.4.11], 
the Riemann curvature tensor, follows from Equations (28.7) and (28.55). The 
reader may verify that the components of this tensor are precisely those introduced 
in Equation (28.22). In fact, it is now more appropriate to change 0(j to and 



R/y = a e l . 


(28.59) 


28.3 COVARIANT DERIVATIVE AND GEODESICS 905 


From Equation (28.57) follows the cyclic property of the lower indices of the 
components of R as given in Equation (28.24). 

The components of R can be conveniently evaluated in terms of connection 
coefficients: 

^ l jki — ^(e l ,ej,ek,ei) = « R(e fc , e/) 勺》 

=(e* ， V efc V e/ ej - V e/ ey - V[ efc ， e; 】 ej 》 (28.60) 

= V ， Wj) - W ， V^Ve^ej) - (eS 

We calculate each pairing separately. The vector in the first pairing is 

Similarly, the second vector can be expressed as 

W e 尸 + r m j k r n ml e n . 

For the third vector, we use the definition of the structure constants and write 

▽[e* ， ei】ej = ^c^e m e i = c Ue m ~ = C m kl V n j m e n . 

Substituting these in Equation (28.60), we obtain 

^jki = - + r- - r'r。- (28.61) 

If we use coordinate frames, the structure constants are zero, and we get 

R l m = -^r--^ + run, - 匕 / r > ( 28 . 62 ) 

If the manifold has a metric (Riemannian or pseudo-Riemannian manifold), then 
a combination of (28.26) and (28.62) gives the curvature in terms of the metric 
tensor. 

j 

28.3.2 Geodesics 

The reader may recall from elementary physics courses that in a flat space，one 
is allowed 7 to move a vector about as long as it is kept parallel to itself. In any 
kind of parallel displacement of a vector, one moves the (tail of the) vector along 
some curve. This curve is in our subconscious in flat space, because it plays no 
role in such a displacement — all curves give the same end result. However, in 
curved spaces, different curves “parallel transport” a vector differently, so that the 


7 Except in situations where the position of the vector is important, as in torques and angular momenta. 



906 28. DIFFERENTIAL GEOMETRY 


parallel transport of 
vectors and tensors 
along a vector 


geodesics defined 


geodesic equation 


end result will be different. Each curve honestly “thinks” that it is indeed parallel 
transporting the vector, i.e.，that it is not changing the direction of the vector as 
the latter moves along the former. No change means zero derivative; and since the 
concept of derivative is local，let us replace the curve with its tangent u: 

28.3.7* Definition. A vector \ is said to be parallel transported along n if W u \ = 0. 
Similarly, a tensor T is said to be parallel transported along u V U T = 0. 

If the manifold M has a metric g, then it is desirable for the parallel transporta¬ 
tion not to affect the metric, i.e” that the angle between any two vectors remain 
the same after transportation. Therefore, we demand that g be constant for parallel 
transportation along any vector, i.e” 

V u g = 0 V u Vg = 0, or g";* = 0 V i,j 9 k. (28.63) 

A statement equivalent to this equation is that 


28.3.8. Box. The operation of raising and lowering of indices commutes 
with the operation of covariant differentiation. 


Consider two vectors v and w. If g satisfies Equation (28.63), then 

V u (v - w) = V u [g(v, w)] = V u (g, v<S>w) 

=<V u g, v 0 w> + (g, (V u v) 0 w) + {g, v 0 (V u w)> 

- g(V u v, w) + g(v, V u w), 

or 

▽u(v • w) = (V u v) • w + v ■ (V u w). (28.64) 

In particular, if v and w are parallel transported along u, their dot product will not 
change. In the flat space of a large sheet of paper, construction of a straight line 
in a given direction starting at a given point 作 is done by laying down the end 
of a vector (a straight edge) at Pq pointing in the given direction, connecting Po 
to a neighboring point P\ along the vector, moving the vector parallel to itself to 
Pu connecting Pi to a neighboring point Pj, and continuing the process. In the 
language of the machinery of the covariant derivative, we might say that a straight 
line is constructed by transporting the tangent vector parallel to itself. 

28.3.9. Definition. Let M be a manifold and y : [a, b] ^ M a curve. Then y 
is called a geodesic of M if the tangent vector u at every point of y is parallel 
transported along itself: V u u = 0. 

It follows from Equation (28.48) ― with v k = u k = dx k jdt — that a geodesic 



28.3 COVARIANT DERIVATIVE AND GEODESICS 907 


curve satisfies the following DE: 


d 2 x k k dx l dx j 

7+ ji ~df~dT 


0. 


(28.65) 


This second-order DE, called the geodesic equation, will have a unique solution 
ifx 1 [a) and jc 1 (a) 9 i.e., the initial point and the initial direction, are given. Thus, 


28.3.10. Box. Through a given point and in a given direction passes only 
one geodesic curve. 


28.3*11. Example. Let us determine the geodesics of the space whose arc length is given 
by ds 2 = (dx 2 + dy 2 )/y 2 (see Example 28.2.5), With x ^ x l and>» =x 2 , we recognize 
the metric tensor as 


811 

s n 


822 

8 22 


r2 


8\2 = Sll = 0 , 

g 12 = ^ 21 =0. 


Using Equation (28.26), we can readily calculate the nonzero connection coefficients: 

r ii2 = ri2i = -r 2 n = r 2 22 - --t* 

The geodesic equation for the first coordinate is 
d 2 x a dx 1 dx^ n 

7 + 


or 


d 2 x 

~d? 


+ 


r n( 


I) 


2 




(f ) ： 


0. 


To find the connection coefficients with raised indices, we multiply those with all indices 
lowered by the inverse of the metric tensor. For instance, 

rl i 2 = ^ lir /i 2 = g n rn 2 +^r 2 i 2 = y 2 (- ★)= - 吾 . 

=0 


Similarly, — 0 = V l 22 , and the geodesic equation for the first coordinate becomes 


d x l dx dy 

~9 — 2—7 丁 = 0, 

dt 2 y dt dt 


(28.66) 


For the second coordinate, we need r 2 lls r 2 12 , and r 2 22 . These can be readily evaluated 
as above, with the result 



908 28 . DIFFERENTIAL GEOMETRY 


isometry defined 


Isometries of a 
manifold form a 
subgro 叩 of Diff(/W). 


Killing vector field 


yielding the geodesic equation for the second coordinate 

^r + ^(§) 2_ ^(§) =0 - 

Withi: = dx/dt. Equation (28.66) can be written as 

^ dJld i=2 dy I d i ^ . = c/ 
dt y x y 

Using the chain rule and the notation y f = dy/dx, we obtain 

芸 = /i = Cy V ， 

^ = C(2^/ + = + //). 

dr dt dx 

Substituting in Equation (28.67) yields 

y 3 y ,2 + y A y ,r + y 3 = 0 => (/) 2 + yy ff + 1 = 0 -7-(^/)+ 1 =0. 


(28.67) 


It follows that yy f = -x-\-A and x 2 + y 2 = 2Ax + B. Thus, the geodesics are circles with 
arbitrary radii whose centers lie on the 文 -axis. 题 


28.4 Isometries and Killing Vector Fields 

Whenever a structure is defined on a given family of sets, mappings between such 
sets that preserve that structure become the “natural” mappings. Thus, the natural 
maps among vector spaces are linear maps，among groups are homomorphism% 
among algebras are algebra homomorphisms, and among manifolds are smooth 
maps. The introduction of a metric on a manifold makes certain maps more privi¬ 
leged than others. 

28.4.1. Definition. Let M and N be Riemannian manifolds with metrics and 
g N ， respectively. The smooth map ♦ M — N is called isometric at P ^ M if 
g M (X ， Y) = g N (\lr^X, xfr^Y) for all X, Y € 7p{M). An isometry of M to N is a 
bijective smooth map that is isometric at every point of M, in which case we have 

伞*9祕 = 9 n . 

Of special interest are isometries i/r : Af —► M of a single manifold. These 
happen to be a subgroup of Diff(M), the group of diffeomoiphisms of M. In the 
subgroup of isometries, we concentrate on the one-parameter groups of transfor¬ 
mations. These define (and are defined by) certain vector fields: 

28.4.2. Definition* Let X e X(M) be a vector field with integral curve F t . Then 
X is called a Killing vector field if F t is an isometry of M. 



28.4 ISOMETRIES AND KILLING VECTOR FIELDS 


The following proposition follows immediately from the definition of the Lie 
derivative [Equation (26.29)]. 

28.4.3. Proposition. A vector field X e X(M) is a Killing vector field if and only 

ifLxg = o. 

• ( * 

Choosing a coordinate system {x f }, we write g = gijdx 1 <g) dx } and conclude 
that X^Sj is a Killing vector field if and only if 

0 = Lxigijdx 1 (g) dx j ) = X(gij)dx l ^>dx J H- gij(Lxdx l ) (Sf dx J 4- gijdx 1 0 (Lxdx j ). 

Killing equation Using Equation (26.33) for the 1-form dx k , we obtain the Killing equation 

X k dkgij + ^iX k gkj + djX k gki = 0. (28.68) 

If in Equation (28.45) we replace T with g and u with X, where X is a Killing 
vector field, we obtain 

0 = 0 + 卯 [(V e ,X)* P W + (V 奴 Xy e’ ⑭ € 勺， (28.69) 


where we have assumed that the covariant derivative is compatible with the metric 
Killing equation in tensor. The reader may check that Equation (28.69) leads to 
terms of covariant 

derivative Xj；k + Xk;j = 0. (28.70) 

This is another form of Killing equation. 

28.4.4. Proposition. IfX is a Killing vector field, then its inner product with the 
tangent to any geodesic is constant along that geodesic, Le” ifu is such a tangent, 
then V u [g(u, X)] = 0. 

Proof. We write the desired covariant derivative in component form, 


= u k g ij;k u i X j -\-u k gij u\ k X j + ^giju'X^ 

-0 =o 

=\ + Xk ； i) u l u k , 


： 0by (28.70) 


where the first term vanishes by assumption of the compatibility of the metric and 
the covariant derivative, and the second term by the geodesic equation. □ 


28.4.5. Example. In a flat m-dimensional manifold we can choose an orthonormal coor¬ 
dinate frame (Theorem 28.2.9), so that the Killing equation becomes 


diXj + djXi = 0 . 


(28.71) 




28. DIFFERENTIAL GEOMETRY 


maximally symmetric 
spaces 


Setting i = j, we see that djXj = 0 (no sum). Differentiating Equation (28.71) with 
respectto x i , we obtain 

dfxj + dj diXi =0 sfxj v i, y. 

Therefore, Xj is linear in x\ i.e., X j = a+ bj with a and bj arbitrary. Inserting 
this in (28.71), we get + aji = 0. The Killing vector is then 

X = (a^x^ + b ( ) di = K {xh t + b% 

The first term is clearly the generator of a rotation and the second term that of translation. 
Altogether, there are m(m - 1)/2 rotations and m translations. So the total number of 
independent Killing vectors is m(m -J-1)/2. Manifolds that have this many Killing vectors 
are called maximally symmetric spaces. 圔 

Using Equations (26.34) and (26.35) one can show that the set of Killing vector 
fields on a manifold form a vector subspace of X (M), and that if X and Y are Killing 
vector fields, then so is [X, Y]. Thus, 


28.4.6. Box. The set of Killing vector fields forms a Lie algebra. 


28.4.7. Example. From g = d0(S>d0-\- sin 2 $d(p®d(p,ih& metric of the unit sphere S 1 , 
one writes the Killing equations 


+ BdXg = 0 , 

d(pX(p + + 2 sin 0 cos BXq = 0, 

^oX(p + d(pXQ — 2 cot 0X(p = 0. 


(28.72) 


The first equation implies that X$ = /( 沪 )， £i function of (p only. Substitution in the second 
equation yields 

T dF 

X(p = -jF((p) sm20 + g(0), where f{(p )=—. 

Inserting this in the third equation of (28.72), we obtain 

—F ( 史 ) cos20 + 字 + 孕 2cot 0[lF((p) sin 20 — g(0)] = 0, 
dO d(p z 


[g_2 C ot^)] + [g + F^)]=0. 

For this equation to hold for all 0 and 史 ， we must have 
|-2cot W = C = -|- W 



28.4 ISOMETRIES AND KILLING VECTOR FIELDS 911 


where C is a constant. This gives 

g(9) = (Ci — C cot 沒） sin 2 0 and f{(p) = Xq = A sin 炉 + 5 cos 沪 

with 

X(p = (A cos (p — B sin (p) sin 0 cos 0 -\-C\ sin 0. 

A general Killing vector field is thus given by 
X = X% + X% = AL y - BL X + CiL z , 
where 


L x = — cos <pb$ + cot 0 sin 沪 3 逆， 
Ly = siiupdo + cot 沒 cos 炉 9 炉， 


discussion of 
conformal 
transformations and 
conformal Killing 
vector fields 


are the generators of 50(3). Q 

Sometimes it is useful to relax the complete invariance of the metric tensor 
under the diffeomorphism of a manifold induced by a vector field and allow a 
change of scale in the metric. More precisely, we consider vector fields X whose 
flow F t changes the metric of M, So where 沴 is a a real-valued 

function on M that is also dependent on the parameter t. Such a transformation 
keeps angles unchanged but rescales all lengths. In analogy with those of the 
complex plane with the same property，we call such transformations conformal 
transfonnatioiis. A vector field that generates a conformal transformation will 
satisfy 


X k B k gij + diX k g kj + djX k g ki = -irgij, ir= ， (28.73) 

ot f=0 

and is called a conformal Killing vector field. 

We now specialize to a flat m-dimensional manifold and choose an orthonormal 
coordinate frame (Theorem 28.2.9). Then Equation (28.73) becomes 


Remember Einstein's 
summation 
convention! 


^iXj + BjXi = -fgij. 

Multiply both sides by g" and sum over i to obtain 

; • m 

2B Xi = => S l X( = —— m = dimM. 


(28.74) 


Apply 3 1 to both sides of Equation (28.74) and sum over/. This yields 

d^iXj-^dj^Xi^-djir d%Xj = — (28.75) 

一 f 少 


912 28. DIFFERENTIAL GEOMETRY 


Differentiate both sides of the second equation in (28.75) with respect to x k and 
symmetrize the result in j and to obtain 

(m — 2)djdfc^ = d l di(dfcXj + djXk) — —gjk^^ii^- 

Raising the index j and contracting it with gives = Oif m ^ l.It follows 
that 


(m — 2)djdi c ilj' = 0. (28.76) 

Equations (28.74), (28.75), and (28.76) determine both ^ and Xj if m ^ 2. 
It follows from Equation (28.76) that 少 is linear in jv, and, consequently [from 

(28.74) ], that Xi is at most quadratic in x. The most general solution to Equation 

(28.74) satisfying (28.75) and (28.76) is 

X ; (x) = y +sx^ + a\x k + c 】 x k xk — 2ckx k x^, (28.77) 


conformal group 


dilitation 

inversion, or special 
conformal 
transformation 


where ay = 一 aji and indices of the constants are raised and lowered as usual. 

Equation (28.77) gives the generators of an [(m+1) (m-h2)/2]-parameter group 
of transformations on R m , m ^ 2 called the conformal group. The reader should 
note that translation (represented by the parameters b^) and rotations (represented 
by the parameters a") are included in this group. The other finite (as opposed to 
infinitesimal) transformations of coordinates can be obtained by using Equation 
(27.25). For example, the finite transformation generated by the parameter s is 
given by the solution to the DE dx” /dt = x’ J , which is = e x x^ or x r ^ = 
e 8 xK and is called a dilitation of coordinates. Similarly, the finite transformation 
generated by the parameter q is given by the solution to the DE dx 1 ^ jdc\ = 
^x !k x ! k - 2x fi x f ^ or 


x 


n 


x l 


rr 


2c • x + c 2 x 2 ? 


where c -x = c k Xk, c 2 ^ c • c, x 2 = x • x, 


which is called inversion, or the special conformal transformation. Equations 

(28.75), and (28.76) place no restriction on ^ (and therefore on X{) when m = 2. 
This means that 


28*4.8. Box, The conformal group is infinite-dimensional for M 2 . 


In fact, we encountered the conformal transformations of M 2 in the context of 
complex analysis, where we showed that any (therefore, infinitely many) analytic 
function is a conformal transformation of C = E 2 . The conformal group of M 2 
has important applications in string theory and statistical mechanics, but we shall 
not pursue them here. 



relative acceleration 


equation of geodesic 
deviation 


28.5 GEODESIC DEVIATION AND CURVATURE 913 


28.5 Geodesic Deviation and Curvature 

Geodesics are the straight lines of general manifolds on which, for example, free 
particles move. If u represents the tangent to a given geodesic, one can say that 
VuU = 0 is the equation of motion of a free particle. In flat spaces, the relative 
velocity of any pair of free particles will not change, so that their relative accelera¬ 
tion is always zero. In general, however, due to the effects of curvature, we expect 
a nonzero acceleration. Let us elaborate on this. 

Consider some region of the manifold through whose points geodesics can 
be drawn in various directions. Concentrate on one geodesic and its neighboring 
geodesics. Let ^ designate the parameter that locates points of the geodesic. Let 
be a continuous parameter that labels different geodesics (see Figure 28.1). One 
can connect the points on some neighboring geodesics corresponding to the same 
value of t and obtain a curve parametrized by s. The collection of all geodesics 
that pass through all points of this curve form a two-dimensional submanifold with 
coordinates t and 5 1 . Each such geodesic is thus described by the geodesic equation 
V u u = 0 with u = d/dt (because ^ is a coordinate). Furthermore, as we hop 
from one geodesic to its neighbor, the geodesic equation does not change; i.e.，the 
geodesic equation is independent of s. Translated into the language of calculus, 
this means that differentiation of the geodesic equation with respect to s will give 
zero. Translated into the higher-level language of tensor analysis, it means that 
covariant differentiation of the geodesic equation will yield zero. We write this 
final translation as V n (V u u) = 0 where n = S/8s. This can also be written as 

0 = VnV u ii = V u Vnii + [Vn, V n ]u = V u (V u n -h [n, u]) + [V n , V n ]u, 

where we have used the first property of Theorem 28.3.2. Using the fact that n and 
u are coordinate frames, we conclude that [n, u] = 0 which in conjunction with 
Equation (28.56) yields 

V u V u n + R(n, u)u = 0. (28.78) 

The first term can be interpreted as the relative acceleration of two geodesic curves 
(or free particles), because V u is the generalization of the derivative with respect 
to t, and V u n is interpreted as relative velocity. In a flat manifold, the relative 
acceleration for any pair of free particles is zero. When curvature is present, it 
produces a nonzero relative acceleration. 

By writing u = u l di and n = n k Bk and substituting in Equation (28.78), we 
arrive at the equation of geodesic deviation in coordinate form: 

=等 + 去 (u j n k r%) + 

dt at (28.79) 

where we have used the fact that u l di f = df/dt for any function defined on the 
manifold. 


914 28. DIFFERENTIAL GEOMETRY 


S 



Figure 28.1 A region of the manifold and the two-dimensional surface defined by s and 


The chain that connects relative acceleration to curvature has another link that 
connects the latter two to gravity. From a Newtonian standpoint, gravity is the only 
force that accelerates all objects at the same rate (equivalence principle). From a 
geometric standpoint, this property allows one to include gravity in the structure 
of space-time: An object in free fall is considered “free，” and its path, a geodesic. 
Locally, this is in fact a better picture of reality, because inside a laboratory in 
free fall (such as a space shuttle in orbit around earth) one actually verifies the 
first law of motion on all objects floating in midair. One need not include an 
external phenomenon called gravity. Gravity becomes part of the fabric of space- 
time. But how does gravity manifest itself? Is there any observable effect that can 
indicate the presence of gravity, or by a mere transfer to a freely falling frame have 
we been able to completely eliminate gravity? The second alternative would be 
strange indeed, because the source of gravity is matter, and if we eliminate gravity 
completely, we have to eliminate matter as well! If the gravitational field were 
homogeneous, one could eliminate it — and the matter that produces it as well, 
but no such gravitational field exists. The inhomogeneity of gravitational fields 
has indeed an observable effect. Consider two test particles that are falling freely 
toward the source of gravity on two different radii. As they get closer and closer to 
the center, their relative distance—in fact, their relative velocity—changes: They 
experience a relative acceleration. Since as we saw in Equation (28.78), relative 

Einstein's acceleration is related to curvature, we conclude that 
interpretation of 
gravity 


28.5.1. Box. Gravity manifests itself by giving space-time a curvature. 



28.5 GEODESIC DEVIATION AND CURVATURE 915 


This is Einstein’s interpretation of gravity. From a Newtonian standpoint, the 
relative acceleration is caused by the inhomogeneity of the gravitational field. 
Such inhomogeneity (in the field of the Moon and the Sun) is responsible for 
tidal waves. That is why the curvature term in Equation (28.78) is also called the 
tide-producing gravitational force. 


28.5.1 Riemann Normal Coordinates 


Starting with a point P of an n-dimensional manifold M on which a covariant 
derivative is defined, we can construct a unique geodesic in every direction, i.e” 
for every vector inTp (M). By parallel transportation of the tangent vectors at P, 
we can construct a vector field in a neighborhood of P: The value of the vector 
field at Q 一 assumed to be close enough to P _is the tangent at Q on the geodesic 
starting at P and passing through The vector field so obtained makes it possible 
to define an exponential map from the tangent space to the manifold. In fact, the 
integral curve exp(rX) of any tangent vector X in 7p(M) is simply the geodesic 
associated with the vector. 

The uniqueness of the geodesics establishes a bijection (in fact, a diffeomor- 
phism) between a neighborhood of the origin of Tp (M) and a neighborhood of P 
in M. This diffeomorphism can be used to assign coordinates to all points in the 
vicinity of P. Recall that a coordinate is a smooth bijection from M to W 1 . Now 
choose a basis for 7p (M) and associate the components of /X in this basis to the 
points on the geodesic exp(^X). Specifically, if {a l }^ =1 are the components of X 
Riemann normal in the chosen basis, then 
coordinates . 

x l (t) = a l t, i = 1,2, 


are the so-called Riemann normal coordinates (RNCs) of points on the geodesic 
of X. The geodesic equations in these coordinates become 

ra’ = 0 = 0 

because r^. is symmetric in i and j, 

28.5.2. Proposition* The connection coefficients at a point P ^ M vanish in the 
Riemann normal coordinates at P • 

Using Equation (28.43), we immediately obtain the following: 

28.5.3. Corollary. Let 1 be a tensor field on M with components 
respect to a Riemann normal coordinate system {jc* } at P. Then 

d 1 * 



8 We are assuming that through any two neighboring points one can always draw a geodesic. For a proof see [Koba 63» 
pp. 172-175]. 


28. DIFFERENTIAL GEOMETRY 


second Bianchi 
identity 


Riemann normal coordinates are very useful in establishing tensor equations. 
This is because 


28.5.4. Box. Two tensors are identical if and only if their components are 
the same in any coordinate frame. 

Therefore, to show that two tensors fields are equal, we pick an arbitrary point 
inM, erect a set ofRNCs, and show that the components of the tensors are equal. 
Since the connection coefficients vanish in an RNC system, and covariant deriva¬ 
tives are the same as ordinary derivatives, tensor manipulations can be simplified 
considerably. For example, the components of the curvature tensor in RNCs are 


^ l jl 




(28.80) 


厂 dx k dx l - 、/ 

This is not a tensor relation~the RHSis not a tensor in a general coordinate system. 
However, if we establish a relation involving the components of the curvature tensor 
alone, then that relation will hold in all coordinates, i.e., it is a tensor relation. For 
instance, from the equation above one immediately obtains 

Rl jkl + Rl ljk + R \lj = °* 

Since this involves only a tensor, it must hold in all coordinate frames. This is the 
Bianchi identity of Equation (28.24). 

28.5.5. Example. Differentiate Equation (28.62) with respect to x m and evaluate the 
result in RNC to get 

只 jkl'm = R jkl t m = ^ jl,km ^ jk ， lm. 

From this relation and r 1 ^ km = r^ 7 we obtain the second Bianchi identity: 


Rl jkl\m + Rl jmk\l + Rl jlm\k 


Rl j[kl ； m] 


(28.81) 


In Einstein’s general relativity, this identity is the analogue of Maxwell’s pair of homoge¬ 
neous equations: + + Fp y , a = 0. _ 


28.5.2 Newtonian Gravity 

The equivalence principle, relating gravity with the curvature of space-time, is 


mature of sp£ 
ling that mh 


not unique to Einstein. What is unique to him is combining that principle with 
the assumption of local Lorentz geometry ， i.e” local validity of special relativity. 
Cartan also used the equivalence principle to reformulate Newtonian gravity in the 
language of geometry. Rewrite Newton’s second law of motion as 

^ 〜 a 2 x j ⑽ 八 

F = ma => a = -V^> — d ― r = 0, 



28.5 GEODESIC DEVIATION AND CURVATURE 917 


where O is the gravitational potential (potential energy per unit mass). The New¬ 
tonian universal time is a parameter that has two degrees of freedom: Its origin 
and its unit of measurement are arbitrary. Thus, one can change t iot — ax 
without changing the physics of gravity. Taking this freedom into account, one can 



dr 2 


0, 


d 2 x j ⑽ 


沿、 2 


dr 2 3x j Wt/ 


0. 


(28.82) 


Comparing this with the geodesic equation, we can read off the nonzero connection 
coefficients: 


oo 


dx J 


,2,3. 


(28.83) 


Inserting these in Equation (28.62), we find the nonzero components of Riemann 
curvature tensor: 


9 2 少 
dx^dx k 


(28.84) 


Contraction of the two nonzero indices leads to the Laplacian of gravitational 
potential 



Therefore, the Poisson equation for gravitational potential can be written in terms 
of the curvature tensor: 


R m ER j QjQ = 4nG P ， (28.85) 

Ricci tensor defined where we have introduced the Ricci tensor, defined as 

Rik = R j ijkr (28.86) 

Equations (28.83), (28.84)，and (28.85) plus the law of geodesic motion describe 
the full content of Newtonian gravitational theory in the geometric language of 
tensors. 9 

It is instructive to discover the relation between curvature and gravity directly 
from the equation of geodesic deviation as applied to Newtonian gravity. The 
geodesic equation is the equation of motion: 

aV ⑽ 

--- j - r = 0. 

dt 2 dx J 

9 The classic and comprehensive book Gravitation, by Misner, Thome, and Wheeler, has a thorough discussion of Newtonian 
gravity in the language of geometry in Chapter 12 and is highly recommended. 



918 28. DIFFERENTIAL GEOMETRY 


Differentiate this equation with respect to the parameter s, noting that d/ds = 
n^/Sx 1 : 


d 

ds 


/aV\ d /d^\ n 







Now note that dx j /ds = n l dx^/bx^ =： n J . So, we obtain 


d 2 n j i a 2 o> 
St 2 ~*" n dx l dxj 



This is equivalent to Equation (28.78), and one recognizes the second term as the 
tide-producing (or the curvature) term. 


28.6 General Theory of Relativity 

No treatment of Riemannian geometry is complete without a discussion of the 
general theory of relativity. That is why we shall devote this last section of the 
current chapter to a brief exposition of this theory. 

We have seen that Newtonian gravity can be translated into the language of dif¬ 
ferential geometry by identifying the gravitational tidal effects with the curvature of 
space-time. This straightforward interpretation of Newtonian gravity, in particular 
the retention of the Euclidean metric and the universality of time, leads to no new 
physical effect. Furthermore, it is inconsistent with the special theory of relativity, 
which mixes space and time coordinates via Lorentz transformations. Einstein’s 
general theory of relativity (GTR) combines the equivalence principle (that freely 
falling objects move on geodesics) with the local validity of the special theory of 
relativity (that the metric of space-time reduces to the Lorentz - Minkowski metric 
of the special theory of relativity). 

28.6.1 Einstein’s Equation 

Since the metric plays a central role in general relativity, let us study its effect on 
the curvature tensor. The reader may verify that (Problem 28.23) 

0 = R(u, v)(w - x) = [R(u ， v)w] * x + w • [R(u, v)x]. 

Using this, one can show that 

Rijid = —Rjikl ， where Rijki 三 gimR m jki ， (28.87) 

which is the first equation of (28.23). Equations (28.23) and (28.24) form a com¬ 
plete set of symmetries of the Riemann curvature tensor. Other symmetries that 
follow from them are 


Rijki = Rkiij and R[ijki] — 0. 


(28.88) 


28.6 GENERAL THEORY OF RELATIVITY 919 


An important tensor that can be constructed out of Riemann curvature tensor 
Einstein tensor and the metric tensor is the Einstein tensor G, which is related to the Ricci tensor 
defined in Equation (28.86). To derive the Einstein tensor, first note that the Ricci 
tensor is symmetric in its indices (see Problem 28.25): 



(28.89) 

curvature scalar Next, define the curvature scalar as 
defined 

R = ^ l i = g lJ Rij. 

(28.90) 


Now，contract i with m in Equation (28.81) and use the antisymmetry of the 
Riemann tensor in its last two indices to obtain 


Rl jkhi + - R m = °- 

Finally, contract j and / and use the antisymmetry of the Riemann tensor in its first 
as well as its last two indices to get — = 0, or Rjk\i — — 

Summarizing the foregoing discussion, we write 

V • G = 0 ， where Gij = Rij "- ^gijR. (28.91) 


Karl Schwarzschild (1873-1916) was the eldest of five sons 
and one daughter bom to Moses Martin Schwarzschild and his 
wife, Henrietta Sabel. His father was a prosperous member of 
the business community in Frankfurt, with Jewish forbears in 
that city traced back to the sixteenth century. 

From his mother, a vivacious, warm person, Karl undoubt¬ 
edly inherited his happy, outgoing personality, and from his fa¬ 
ther, a capacity for sustained hard work. His childhood was spent 
in comfortable circumstances among a large circle of relatives, 
whose interests included art and music; he was the first to be¬ 
come a scientist. 

After attending a Jewish primary school, Schwarzschild entered the municipal gymna¬ 
sium in Frankfurt at the age of eleven. His curiosity about the heavens was first manifested 
then: He saved his allowance and bought lenses to make a telescope. Indulging this interest, 
his father introduced him to 狂 friend, J. Epstein, a mathematician who had a private obser¬ 
vatory. With Epstein’s son (later professor of mathematics at the University of Strasbourg), 
Schwarzschild learned to use a telescope and studied mathematics of a more advanced type 
than he was getting in school. His precocious mastery of celestial mechanics resulted in 
two papers on double star orbits, written when he was baiely sixteen. 

In 1891 Schwarzschild began two years of study at the University of Strasbourg, where 
Ernst Becker，director of the observatoiy, guided the development of his skills in practical 
astronomy — skills that later were to forma solid underpinning for his masterful mathemat¬ 
ical abilities. 





920 28. DIFFERENTIAL GEOMETRY 


At age twenty Schwarzschild went to the University of Munich. Three years later, 
in 1B96, he obtained his Ph.D., summa cum laude. His dissertation was an application 
of Poincare’s theory of stable configurations in rotating bodies to several astronomical 
problems, including tidal deformation in satellites and the validity of Laplace’s suggestion 
as to how the solar system had originated. Before graduating, Schwarzschild also found 
time to do some practical work with Michelson's interferometer. 

At a meeting of the German Astronomical Society in Heidelberg in 1900 he discussed 
the possibility that space was non-Euclidean. In the same year he published a paper giving 
a lower limit for the radius of curvature of space as 2500 light years. From 1901 until 1909 
he was professor at Gottingen, where he collaborated with Klein, Hilbert, and Minkowski, 
publishing on electrodynamics and geometrical optics. In 1906, he studied the transport of 
energy through a star by radiation. 

From Gottingen he went to Potsdam，but in 1914 he volunteered for military service. He 
served in Belgium, France, and Russia. While in Russia he wrote two papers on Einstein’s 
relativity theory and one on Planck’s quantum theory. The quantum theory paper explained 
that the Stark effect could be proved from the postulates of quantum theory. 

Schwarzschild’s relativity papers give the first exact solution of Einstein’s general grav¬ 
itational equations, giving an understanding of the geometry of space near a point mass. He 
also made the first study of black holes, showing that bodies of sufficiently large mass would 
have an escape velocity exceeding the speed of light and so could not be seen. However, he 
contracted an illness while in Russia and died soon after returning home. 


The automatic vanishing of the divergence of the symmetric Einstein tensor has 
an important consequence in the field equation of GTR. It is reminiscent of a similar 
situation in electromagnetism, in which the vanishing of the divergence of the fields 
leads to the conservation of the electric charge, the source of electromagnetic 
fields. 10 

Just as Maxwell’s equations are a generalization of the static electricity of 
Coulomb to a dynamical theory, Einstein’s GTR is the generalization of Newtonian 
static gravity to a dynamical theory. As this generalization ought to agree with the 
successes of the Newtonian gravity, Equation (28.91) must agree with (28.85). The 
bold step taken by Einstein was to generalize this relation involving only a single 
component of the Ricci tensor to a full tensor equation. The natural tensor to be 
used as the source of gravitation is the stress energy tensor 11 

= (p + p)u^u v + pg^, or T = (p + p)u (gi u + pg, 

where the source is treated as a fluid with density p, four-velocity u, and pressure 
p. So, Einstein suggested the equation G = /cT as the generalization of Newton’s 


10 It was Maxwell’s discovery of the inconsistency of the pre-Maxwellian equations of electromagnetism with charge conser¬ 
vation that prompted him to change not only the fourth equation (to make the entire set of equations consistent with the charge 
conservation), but also the course of human history. 

11 In GTR, it is customary to use the convention that Greek indices run from 0 to 3, i.e„ they include both space and time, while 
Latin indices encompass only the space components. 






28.6 GENERAL THEORY OF RELATIVITY 921 


universal law of gravitation. Note that V - G = 0 automatically guarantees mass- 
energy conservation as in Maxwell’s theory of electromagnetism. Problem 28.27 
calculates k to be 8tt m units in which the universal gravitational constant and the 
Einstein’s equation of speed of light are set equal to unity. We therefore have 
the general theory of 

relativity G = 8 ttT, or R — = 8 宂 [(/> + p)u ⑭ u + pg]. (28.92) 


This is Einstein^s equation of the general theory of relativity. 

The Einstein tensor G is nearly the only symmetric second-rank tensor made 
out of the Riemann and metric iensors that is divergence free. The only other 
tensor with the same properties is G + Ag, where A is the so-called cosmological 
cosmological constant (see Problem 28.28). When in 1917, Einstein applied his GTR to the 
constant universe itself, he found that the universe ought to be expanding. Being a firm 
believer in Nature, he changed his equation to G + Ag = 8ttT to suppress the 
unobserved prediction of the expansion of the universe. Later, when the expansion 
was observed by Hubble, Einstein referred to this mutilation of his GTR as “the 
biggest blunder of my life.” 


Aleksandr Aleksandrovich Friedmann (1888-1925) was bom 
into a musical family ― his father, Aleksandr Friedmann, being 
a composer and his mother, Ludmila Vojacka, the daughter of 
the Czech composer Hynek Vojacek. 

In 1906 Friedmann graduated from the gymnasium with 
the gold medal and immediately enrolled in the mathematics 
section of the department of physics and mathematics of St. 

Petersburg University. While still a student, he wrote a number of 
unpublished scientific papers, one of which was awarded a gold 
medal by the department. After graduation from the university 
in 1910, Friedmann was retained in the department to prepare 
for the teaching profession. 

In the fall of 1914, Friedmann volunteered for service in an aviation detachment, in 
which he worked, first on the northern front and later on other fronts, to organize aerologic 
and aeronavigational services. While at the front, Friedmann often participated in military 
flights as an aircraft observer. In the summer of 1917 he was appointed a section chief in 
Russia’s first factory for the manufacture of measuring instruments used in aviation; he later 
became director of the factory. Friedmann had to relinquish this post because of the onset 
of heart disease. From 1918 until 1920, he was professor in the department of theoretical 
mechanics of Perm University. 

In 1920 he returned to Petrograd and worked at the main physics observatory of the 
Academy of Sciences, first as head of the mathematics department and later, shortly before 
his death, as director of the observatory. Friedmann’s scientific activity was concentrated 
in the areas of theoretical meteorology and hydromechanics, where he demonstrated his 
mathematical talent and his unwavering strife for, and ability to attain, the concrete, practical 
application of solutions to theoretical problems. 

Friedmann made a valuable contribution to Einstein's general theory of relativity. As 
always, his interest was not limited simply to familiarizing himself with this new field 












922 28. DIFFERENTIAL GEOMETRY 


of science but led to his own remarkable investigations. Friedmann’s work on the theory 
of relativity dealt with the cosmological problem. In his paper “Ober die Kriimmung des 
Raumes” (1922), he outlined the fundamental ideas of his cosmology: the supposition 
concerning the homogeneity of the distribution of matter in space and the consequent 
homogeneity and isotropy of space-time. This theory is especially important because it 
leads to a sufficiently correct explanation of the fundamental phenomenon known as the 
“red shift.” Einstein himself thought that the cosmological solution to the equations of a 
field had to be static and had to lead to a closed model of the universe. Friedmann discarded 
both conditions and arrived at an independent solution. 

Friedmann’s interest in the theory of relativity was by no means a passing fancy. In 
the last years of his life, together with V. K. Frederiks, he began work on a multivolume 
text on modem physics. The first book. The World as Space and Time t is devoted to the 
theory of relativity, knowledge of which Friedmann considered one of the cornerstones of 
an education in physics. 

In addition to his scientific work, Friedmann taught courses in higher mathematics and 
theoretical mechanics at various colleges in Petrograd. He found time to create new and 
original courses, brilliant in their form and exceedingly varied in their content. Friedmann’s 
unique course in theoretical mechanics combined mathematical precision and logical con¬ 
tinuity with original procedural and physical trends. 

Friedmann died of typhoid fever at the age of thirty-seven. In 1931, he was posthumously 
awarded the Lenin Prize for his outstanding scientific work. 


28.6.2 Static Spherically Symmetric Solutions 

The general theory of relativity as given in Equation (28.92) has been strikingly 
successful in predicting the spacetime 12 structure of our universe. It predicts the 
expansion of the universe, and by time-reversed extrapolation, the big bang cos¬ 
mology; it predicts the existence of black holes and other final products of stellar 
collapse; and on a less grandiose scale, it explains the small precession of Mer¬ 
cury, the bending of light in the gravitational field of the Sun, and the gravitational 
redshift. We shall not discuss the solution of Einstein's equation in any detail. 
However, due to its simplicity and its use of geometric arguments, we shall con¬ 
sider the solution to Einstein’s equation exterior to a static spherically symmetric 
distribution of mass. 

Let us first translate the two adjectives used in the last sentence into a geometric 
language. Take static first. We call a phenomenon “static” if at different instants 
it “looks the same.’’ Thus, a static solution of Einstein’s equation is a spacetime 
manifold that “looks the same” for all time. In the language of geometry “looks the 
same” means isometric, because metric is the essence of the geometry of space- 
time. In Euclidean physics, time can be thought of as an axis at each point (moment) 


12 The reader maybe surprised to see the two words “space” and “time” juxtaposed with no hyphen; but this is common practice 
in relativity. 








28.6 GENERALTHEORY OF RELATIVITY 923 


of which one can assign a three-dimensional space corresponding to the “spatial 
universe” at that moment. In the general theory of relativity, space and time can 
be mixed, but the character of time as a parameter remains unaltered. Therefore, 
instead of an axis~a straight line—we pick a curve, a parametric map from the 
real line to the manifold of space-time. This curve must be timelike，so that locally, 
when curvature is ignored and special relativity becomes a good approximation, 
we do not violate causality. The curve must also have the property that at each point 
of it, the space-time manifold has the same metric. Moreover, we need to demand 
that at each point of this curve, the spatial part of the space-time is orthogonal to 
the curve. 


stationary and static 
spacetimes; time 
translation 
isometries 


Killing parameter 


28.6.1. Definition. A spacetime is stationary ifthere exists a one-parameter group 
of isometries F tJ called time translation isometries ， whose Killing vector fields ^ 
are timelike for all t: g(^, > 0. If in addition, there exists a spacelike hyper¬ 
surface S that is orthogonal to orbits (curves) of the isometries，we say that the 
spacetime is static. 

We can simplify the solution to Einstein’s equation by invoking the symmetry 
of spacetime discussed above in our choice of coordinates. Let P be a point of the 
spacetime manifold located in a neighborhood of some spacelike hypersurface S as 
shown in Figure 28.2. Through 尸 passes a single orbit of the isometry, which starts 
at a point Q of S. Let t, the so-called Killing parameter, stand for the parameter 
corresponding to the point P with t = 0 being the parameter of Q. On the spacelike 
hypersurface S, choose arbitrary coordinates {f} for Q. Assign the coordinates 
(t = x°, x l ,x 2 ,x^) to P. Since F t does not change the metric, the 

translation of S by F t , is also orthogonal to the orbit of the isometry. Moreover, 
the components of the metric in this coordinate system cannot be dependent on the 
Killing parameter /. Thus, in this coordinate system, the spacetime metric takes 
the form 


3 

g = goodt ®dt - gijdx 1 0 dxL 



(28.93) 


28.6.2. Definition. A spacetime is spherically symmetric if its isometry group 

spherically contains a subgroup isomorphic to 50(3) and the orbits of this group are two- 

symmetric dimensional spheres. 
spacetimes 

In other words, if we think of isometries as the action of some abstract group, 
then this group must contain SO(3) as a subgroup. Since SO (3) is isomorphic to 
the group of rotations, we conclude that the metric should be rotationally invariant. 
The time-translation Killing vector field 在 must be orthogonal to the orbits of 
SO(3), because otherwise the generators of SO(3) can change the projection of ^ 
on the spheres and destroy the rotational invariance- Therefore, the 2-dimensional 
spheres must lie entirely in the hypersurfaces E f . Now, we can write down a 


924 28. DIFFERENTIAL GEOMETRY 


S 



Figure 28.2 The coordinates appropriate for a stationary spacetime. 


static spherically symmetric metric in terms of appropriate coordinates as follows. 
Choose the spherical coordinates (0, (p) for the 2-spheres, and write the metric of 
this sphere as 

g 2 = r 2 dO ®d9 + r 2 sin 2 9dq> ® d<p, 

where r is the “radius” of the 2 - sphere. Choose the third spatial coordinate to 
be orthogonal to this sphere, i.e., r. Rotational symmetry now implies that the 
components of the metric must be independent of 6 and cp. The final form of the 
metric based entirely on the assumed symmetries is 

9 = f{r)dt <S>dt — h(r)dr ^dr — r 2 {dO + sin 2 Gd<p <S) dip). 

(28.94) 


Schwarzschild 
solution of Einstein’s 
equation 


We have reduced the problem of finding ten unknown functions [gfiv] 3 ^ v=0 of four 

variables to that of two functions / and A of one variable. The remaining 

task is to calculate the Ricci tensor corresponding to Equation (28.94), substitute 
it in Einstein’s equation (with the RHS equal to zero), and solve the resulting 
differential equation for / and h. We shall not pursue this further here, and we 
refer the reader to textbooks on general relativity (see, for example, [Wald 84, 
pp. 121-124]). The final result is the so-called Schwarzschild solution, which is 

9 = - — j dt — ^1 - ) dr 0 dr 

— r 2 (d0 ^dO -\- sin 2 Od(p (g) d<p), (28.95) 


where M is the total mass of the gravitating body, and the natural units of GTR, 
in which G = 1 = c, have been used. 










28.6 GENERALTHEORY OF RELATIVITY 925 


A remarkable feature of the Schwarzschild solution is that the metric compo¬ 
nents have singularities at r 二 2M and atr = O.It turns out that the first singularity 
is due to the choice of coordinates (analogous to the singularity at r = 0,0 = 0, 丌， 
and ^ = 0 in spherical coordinates of R 3 ), while the second is a true singularity 
of spacetime. The first singularity occurs at the so-called Schwarzschild radius 
Schwarzschild radius whose numerical value is given by 

2GM ^ M , 
rs = — ^ 3— km, 
c L Mq 

where Mq = 2 x 10 30 kg is the mass of the Sun. Therefore, for an ordinary body 
such as the Earth, planets, and a typical star, the Schwarzschild radius is well inside 
the body where the Schwarzschild solution is not applicable. 

If we relax the assumption of staticity，we get the following theorem (for a 
proof, see [Misn 73, p. 843]): 

28.6.3. Theorem. (Birkhoff’s theorem) The Schwarzschild solution is the only 
spherically symmetric solution of Einstein's equation in vacuum. 

A corollary to this theorem is that 


28«6.4. Box. All spherically symmetric solutions of Einstein’s equation in 
vacuum are static. 


This is analogous to the fact that the Coulomb solution is the only spherically 
symmetric solution to Maxwell’s equations in vacuum. It can be interpreted as the 
statement that in gravity, as in electromagnetism, there is no monopole “radiation.” 

28.6.3 Schwarzschild Geodesics 

With the metric available to us, we can, in principle, solve the geodesic equations 
[Equation (28.65)] and obtain the trajectories of freely falling particles. However, 
a more elegant way is to make further use of the symmetries to eliminate variables. 
In particular, Proposition 28.4.4 is extremely useful in this endeavor. Consider first 
g(3@, u) where u is the 4-velocity (tangent to the geodesic). In the coordinates we 
are using, this yields 

g ( 洳， u) = = r 2 0. 

r 2 Soti 

This quantity (because of Proposition 28.4.4 and the fact that de is a Killing vector 
field) is a constant of the motion, and its initial value will be an attribute of the 
particle during its entire motion. We assign zero to this constant, i.e.,we assume that 
initially 0-0. This is possible, because by rotating our spacetime—an allowed 



926 28. DIFFERENTIAL GEOMETRY 


operation due to rotational symmetry — we take the equatorial plane 0 = 7r/2 to 
be the initial plane of the motion. Then the motion will be confined to this plane ， 
because 0 = 0 for all time. 

For the parameter of the geodesic equation, choose proper time r if the geodesic 
is timelike (massive particles), and any (affine) parameter if the geodesic is null 
(massless particles such as photons). Then g^v^x' 0 == k, where 



for timelike geodesics, 
for null geodesics. 


(28.96) 


In terms of our chosen coordinates (with 9 = jt/ 2), we have 

k = = (1 — 2M/r)i 2 — (1 — 2M/ry x r 2 - r 2 <p 2 . (28.97) 


Next, we apply Proposition 28.4.4 to the time translation Killing vector and 
write 

E = = (1 - 2M/r)i, (28.98) 

where 五 is a constant of the motion and 在 = 汍 . In the case of massive particles, 
as r — oo, i.e., as we approach special relativity, E becomes t, which is the rest 
energy of a particle of unit mass. 13 Therefore, it is natural to interpret E for finite 
r as the total energy (including gravitational potential energy) per unit mass of a 
particle on a geodesic. 

Finally, the other rotational Killing vector field 如 gives another constant of 
motion, 

L^g(d (fi 9 u)=r 2 (p, (28.99) 

which can be interpreted as the angular momentum of the particle. This reduces 
to Kepler’s second law: Equal areas are swept out in equal times, in the limit of 
Newtonian (or weak) gravity. However, in strong gravitational fields, spacetime is 
not Euclidean, and Equation (28.99) cannot be interpreted as “areas swept out.” 
Nevertheless, it is interesting that the “form” of Kepler’s second law does not 
change even in strong fields. 

Solving for i and 矽 from (28.98) and (28.99) and inserting the result in (28.97), 
we obtain 

1.2 1 2M\ (L 1 \ 1 

2 r + 2 \ 1 -—) \^ +K ) = 2 E ' (28 獨 

It follows from this equation that the radial motion of a particle on a geodesic is the 
same as that of a unit mass particle of energy £ 2 /2 in ordinary one-dimensional 


13 Recali that the 4-momentum of special relativity is p^ 1 •= mx 11 . 








28.6 GENERAL THEORY OF RELATIVITY 927 



Figure 28.3 The effective potential V(r) for a massive particle with L 2 = 5M 2 . 


nonrelativistic mechanics moving in an effective potential 


V(r) 


/, 2M\ (L 1 \ 1 


2 


M L 2 ML 2 


(28.101) 


Once we solve Equation (28.100) for the radial motion in this effective potential, 
we can find the angular motion and the time coordinate change from (28.98) 
and (28.99). The new feature provided by GTR is that in the radial equation of 
motion, in addition to the “Newtonian term” —kM/t and the “centrifugal barrier” 
L 2 /2r 2 , we have the new term —ML 1 fr\ which, a small correction for large r, 
will dominate over the centrifugal barrier term for small r. 

Let us consider first the massive particle case, /c = 1. The extrema of the 
effective potential are given by 


A/r 2 — L 2 r + 3ML 2 = 0 R±^ 


L 2 士 VL 4 ~ 12L 2 M2 

2M 


(28.102) 


Thus, if L 2 < 12M 2 , no extrema exist (see Figure 28.3), and a particle heading 
toward the center of attraction, will fall directly to the Schwarzschild radius r = 
2M, the zero of the effective potential, and finally into the spacetime singularity 
r == 0. For L 2 > 1 2M 2 , the reader may check that is a minimum of V (r), while 
R_ is a maximum (Figure 28.4). It follows that stable (unstable) circular orbits 
exist at the radius r = R^. (r = /?_). In the Newtonian limit of M <§： L, we get 
R + « L 2 /M, which agrees with the calculation in Newtonian gravity (Problem. 
28.30). Furthermore, Equation (28.102) puts a restriction of i? + > 6M on J? + and 
3M < R_ < 6M on R-. This places the planets of the Sun safely in the region of 
stable circular orbits. 


928 28. DIFFERENTIAL GEOMETRY 


If a massive particle is displaced slightly from its stable equilibrium radius /?+， 
it will oscillate radially with a frequency co r given by 14 



d 2 V 6M) 

'dr 1 r=R+ = Rl - 3MRl 


On the other hand, the orbital frequency is given by Equation (28.99 )， 

2 _ _ M 

〜 — 巧 - Rl~- 3MR\ ’ 

where L 2 has been calculated from (28.102) and inserted in this equation. In the 
Newtonian limit of Af 《 R^., we have co^ ^ o) r ^ M/R\. If — co r , the 
particle will return to a given value of r in exactly one orbital period and the orbit 
will close. The difference between and co r in GTR means that the orbit will 
precess at a rate of 

0) p = COfp - (Or = (l - (Or/cO^COfp = - [(1 - 6M/R + ) 1/2 - 1]0) 妒 , 
which in the limit of M 《 reduces to 


3M 3 / 2 3(GM) 3 / 2 

If we include the eccentricity e and denote the semimajor axis of the elliptical orbit 
by a, then the formula above becomes (see [Misti 73, p. 1110]) 


〜 3(GM) 3 /2 
c 2 (l-e 2 )a 5 / 2 ' 


(28.103) 


Due to its proximity to the Sun, Mercury shows the largest precession fre¬ 
quency, which, after taking into account all other effects such as perturbations due 
to other planets, is 43 seconds of arc per century. This residual precession rate 
had been observed prior to the formulation of GTR and had been an unexplained 
mystery. Its explanation was one of the most dramatic early successes of GTR. 

We now consider the null geodesics. With ^ = 0 in Equation (28.101), the 
effective potential becomes 

L 2 

V(r) = ^(r-2M), (28.104) 

which has a maximum at r = /? max = 3M, as shown in Figure 28.5. It follows 
that in GTR, unstable circular orbits of photons exist at r ― 3M, and that strong 
gravity has significant effect on the propagation of light. 


14 In the Taylor expansion of any potential V (r) about the equilibrium position tq of a particle (of unit mass)，it is the second 
derivative term that resembles Hooke’s potential, \kx 2 with k = (d 2 V/dr 2 ) rQ . 









28.6 GENERAL THEORY OF RELATIVITY 929 


2M 


4M 


6M 


8M 


10M 


Figure 28.4 The effective potential V (r) for a massive particle with L 2 = 20A/ 2 . 


The minimum energy required to overcome the potential barrier (and avoid 
falling into the infinitely deep potential well) is given by 

t2 L 2 

^ gi = 屢 


2 


E 2 = V(R mQX ) 


r 2 


54M 2 


In flat spacetime, L/E is precisely the impact parameter ^ of a photon, i.e., the 
distance of closest approach to the origin. Thus the Schwarzschild geometry will 
capture any photon sent toward it if the impact parameter is less thani c = v^7 M. 
Hence, the cross section for photon capture is 

ex 三 Ttb^ = 27nM 2 . 

To analyze the bending of light, we write Equation (28.100) as 

k {%) ^ + y(r) = \ e2 - 

Substituting for <p from Equation (28.99) and writing the resulting DE in terms of 
a new variable u == M/r, we obtain 


A/ 2 , 


b 


E 

~L 


where we used Equation (28.104) for the effective potential. Differentiating this 
with respect to (p, we finally get the second-order DE, 


d 2 u ^ 2 
——- + u = 3u _ 
dw 1 


(28.105) 


distance from origin，r 














930 28. DIFFERENTIAL GEOMETRY 


R_ IOM 


R, 20M 


30 M 


40 M 


Figure 28.5 The effective potential V (r) for a massless particle. Note that the shape of 
the potential is independent of L 2 . 


gravitational redshift 
discussed 


In the large-impact-parameter or small-w approximation, we can ignore the second- 
order term on the RHS and solve for u. This will give the equation of a line in 
polar coordinates. Substituting this solution on the RHS of Equation (28.105) 
and solving the resulting equation yields the deviation from a straight line with a 
deflection angle of 


A(p ^ 


4M 4GM 


b 


be 2 


(28.106) 


where we have restored the G’s and the c’s in the last step. 

For a light ray grazing the Sun, b = Rq =7xl0 8 mandM = M© = 2x 10 30 
kg, so that Equation (28.106) predicts a deflection of 1.747 seconds of arc. This 
bending of starlight passing near the Sun has been observed many times, beginning 
with the 1919 expedition led by Eddington. Because of the intrinsic difficulty of 
such measurements, these observations confirm GTR only to within 10% accuracy. 
However, the bending of radio waves emitted by quasars has been measured to an 
accuracy of 1%, and the result has been shown to be in agreement with Equation 
(28.106) to within this accuracy. The last topic we want to discuss, a beautiful 
illustration of Proposition 28.4.4 is the gravitational redshiftLet 0\ and O 2 be 
two static observers (by which we mean that they each move on an integral curve of 
the Killing vector field ^). It follows that the 4_ velocities ui and 112 of the observers 
are proportional to 芒 . Since ui and U 2 have unit lengths, we have 


Ui 






1 , 2 . 


distance from origin, r 





















28.6 GENERAL THEORY OF RELATIVITY 931 


5M 


10M 


15M 


20 M 


Figure 28.6 A spacetime diagram depicting the emission of light by observer 0\ and its 
reception by 02 - 


Suppose 0\ emits a light beam and O 2 receives it (see Figure 28.6). Since light 
travels on a geodesic, Proposition 28.4.4 gives 


g(uO g(u，^ 2 ), 


(28.107) 


where u is tangent to the light trajectory (or the light signal’s 4-velocity). The 
frequency of light for any observer is the time component of its 4-velocity, and 
because the 4-velocity of an observer has the form (1 ， 0,0, 0) in the frame of that 
observer, we can write this invariantly as 


m = g(u, Uj) 




i = 1,2. 


In particular, using Equation (28.107), we obtain 


0)2 


9(u, Ml) 9(u, \/g( 《 2 ， € 2 ) /l -2M/r 2 


g(u ， u 2 ) g(u, € 2 )" 攸 2 , € 2 ) \/g(U) 


2M/n 


where we used g ( 名 ， O = goo = (1 — 2M/r) for the Schwarzschild spacetime, 
and r】and 广 2 are the radial coordinates of the observers 0\ and O 2 , respectively. 
In terms of wavelengths, we have 


入 1 
入 2 


1 - 2M/n 
1 — 2M/r 2 


(28.108) 


It follows from Equation (28.108) that as light moves toward regions of weak 
gravity (，2 > n), the wavelength increases ( 入 2 > 人 0, i.e.，it will be “red-shifted.” 


distance from origin ，r 















932 28. DIFFERENTIAL GEOMETRY 


this makes sense, because an increase in distance from the center implies an in¬ 
crease in the gravitational potential energy, and, therefore, a decrease in a photon’s 
energy hco. Pound and Rebka used the Mossbauer effect in 1960 to measure the 
change in the wavelength of a beam of light as it falls down a tower on the surface 
of the Earth. They found that, to within the 1% experimental accuracy, the GTR 
prediction of the gravitational redshift was in agreement with their measurement. 

28.7 Problems 


28.1. Show that dS = fl A (G@) — © 八 (Gfi). 

28.2. Let A and B be matrices whose elements are one-forms. Show that (AaB)^ = 
-B f AA f . 

28.3. Write Equation (28.20) in component form and derive Equation (28.24). 


28.4. FindJftif 

/ 0 -COt0€^\ 

U ： == \cot0e^ 0 )' 

where (0, <p) are coordinates on the unit sphere S 2 . 

28.5. Find the curvature of the two-dimensional space whose arc length is given 
by ds 2 = dx 2 + x 2 dy 2 . 

28.6. Find the curvature of the three-dimensional space whose arc length is given 
by ds 2 ― dx 2 + x 2 dy 2 -|- dz 2 ' 

2SJ 9 Find the curvature tensors of the Friedmann and Schwarzschild spaces given 
in Example 28.2.3. 

28.8 .⑻ Show that in R 3 the composite operator do* gives the curl of a vector 
when the vector is written as components of a two-form. 

(b) Similarly, show that * o d is the divergence operator for one-forms. 

(c) Use these results and the procedure of Example 28.2.4 to find expressions for 
the curl and the divergence of a vector in curvilinear coordinates. 

28_9. Let u = e“ v = e*，and = e J m (2834) to arrive at (28.35). 

28,10. Using Equations (28.7) and (28.55) show that R is a tensor of type (1,3).In 
particular, it does not differentiate functions that multiply vectors and the 1-form 
in its argument. 


28.11. Show that 


R(u, v)w + R(w, u)v + R(v, w)u = 0. 

Hint: Use the first property of Theorem 28.3.2 to change the vector with respect 
to which covariant differentiation is being performed. Then use the Jacobi identity 
for Lie brackets of vectors. 




28.7 PROBLEMS 933 


28.12. Use (28.55) and (28.11) to show that the components of R are precisely 
those introduced in Equation (28.22). 

28.13. Prove the statement in Box 28.3.8. 

28.14. Start with d^x 1 jdt^ — 0, the geodesic equations in Cartesian coordi- 
nates. Transfonn these equations to spherical coordinates (r,0 t (p) using x — 
r sin ^ cos 9 ?, y =： r sin 0 sin <p, and z = r cos 9, and the chain rule. From these 
equations read off the connection coefficients in spherical coordinates [refer to 
Equation (28.65)]. Now use Equation (28.43) and Definition 28.3.6 to evaluate the 
divergence of a vector. 

28.15. Find the geodesics of a manifold whose arc element is 办 2 = dx 2 +dy 2 -]r 
dz 2 - 

28.16* Find the geodesics of the metric ds 2 — dx 2 -f x 2 dy 2 ^ 

28.17. Find the differential equation for the geodesics of the surface of a sphere 
of radius a having the line element ds 2 = a 2 d0 2 4 - a 2 sm 2 0d^) 2 t Verify that 
A cos 妒 + 5 sin 沪 + cot 沒 = 0 is the intersection of a plane passing through the 
origin and the sphere. Also, show that it is a solution of the differential equation 
of the geodesics. Hence, the geodesics of a sphere are great circles. 

28.18. The Riemann normal coordinates are given by x l — a l t. For each set of a 1 , 
one obtains a different set of geodesics. Thus, we can think of a 1 as the parameters 
that distinguish among the geodesics. 

(a) By keeping all a 1 (and 0 fixed except the yth one and using the definition of 
tangent to a curve, show that ny ― tdj, where is (one of) the u(’s) appearing in 
the equation of geodesic deviation. 

(b) Substitute (a) plus u ( = x l = a 1 in Equation (28.79) to show that 

K ijk + K ^ kij + 1 kj,t) 

Substitute for one of the on the RHS using Equation (28.80). 

(c) Now use the cyclic property of the lower indices of the ^curvature tensor to show 
that 

心 =4( 仏 +/ ^)* 

28.19. Let 诊 ：M — iV be isometric at P 6 M. Show that ^ is necessarily 
injective. Hint: Look at the null space of 

28.20 •⑻ Show that the covariant and Lie derivatives, when applied to a 1-form, 
are related by 

{L u cj, v) = (V u u;, v) + (co, V v u). 

(b) Use this to derive the identity 


934 28. DIFFERENTIAL GEOMETRY 


28.21. Show that Equation (28.69) leads to Equation (28.70). 

28.22. Show that a vector field that generates a conformal transformation satisfies 

xkdkgij 十 Sj X^gicj + djX^g^i = 一 fgij • 

28.23. (a) Show that R(u, v)(/) = 0 for all functions defined on M. 

(b) Using Equation (28.64) show that 

0 = R(u, v)(w • x) = w • [R(u, v)x] + [R(u, v)w] - x. 

⑹ Conclude from (b) that Riju = —Rjm. 

28.24. Use the symmetries of Rijki [Equations (28.23) and (28.24)] to show that 
Rijki = Rklij and R[ijki] = 0. 

28.25. Use the symmetry properties of Riemann curvature tensor to show that 

⑻ J = 0, and 

(b) Rij = Rjh 

(c) Show that + Rjk'i — Rjl；k = 0, and conclude that V ■ G = 0, or, in 
component form ， G. k . k = 0. 

28.26. Show that in an n-dimensional manifold without metric the number of 
independent components of the Riemann curvature tensor is 

n 3 (n — 1) n 2 (n — l)(n — 2) — n 2 {n 2 — 1) 

~2 6 = 3 ' 

If the manifold has a metric, the number of components reduces to 

■n(n — 1)-| 2 n 2 (n — \){n — 2) _ rt 2 (n 2 — 1) 

- ~2~ 6 = 12 . 

28.27. (a) Take the trace of both sides of R — ^Rg= kT to obtain R = —fcT/t = 

-kT. . ^ 

(b) Use ⑻ to obtain Roq = \k(Tqo + ")• 

(c) Now use the fact that in Newtonian limit 7Jy 《 Too % p to conclude that 
agreement of Einstein’s and Newton’s gravity demands that /c = 8;r in units in 
which the universal gravitational constant is unity. 

28.28. (a) Show that the most general second-rank symmetric tensor 五 " con¬ 
structed from the metric and Riemann curvature tensors that is linear in the curva¬ 
ture tensor is 

Eij = a + bgijR + Agij, 

where a, b, and A are constants. 

(b) Show that Eij has a vanishing divergence if and only if ^ 

(c) Show that in addition, Efj vanishes in flat space-time if and only if A = 0. 



28.7 PROBLEMS 935 


28.29. Show that /?+ and J?_, as given by Equation (28.102) are, respectively, a 
minimum and a maximum of V (r). 

28.30. Use F = ma to show that in a circular orbit of radius R, we have L 2 = 
GMR. 


28.31. Show that R + > 6M and 3M < R- < 6M, where R± are given by 
Equation (28.102). 

28«32. Calculate the energy of a circular orbit using Equation (28.100), and show 
that 


E(R) 


R-2M 


^/R 2 -3MR 
where i? = /? 土 


28.33. Show that the radial frequency of osculation of a massive particle in a stable 
orbit of radius R^. is given by 


2 M(R + - 6M) 

>r — R\ - 3MR^ 


28.34. Derive Equation (28.106) from Equation (28.105). 


Additional Reading 

1. Mistier, C” Thome, K., and Wheeler, J. Gravitation, Freeman, 1973. A 
classic text on index-free differential geometry. This is the definitive text on 
Einstein’s general theory of relativity. 

2. Nakahara, M. Geometry, Topology, and Physics ， Adam Hilger, 1990, A very 
readable book for graduate students of physics with a lot of emphasis on 
gauge theories and topological concepts. The book has a standard introduc¬ 
tion to manifolds and a good discussion of differential geometry, especially 
Killing vector fields. 

3. Wald, R. General Relativity, University of Chicago Press, 1984. A more 
up-to-date book on differential geometry and general relativity. 



29 _ 

Lie Groups and Differential Equations 


Lie groups and Lie algebras, because of their manifold — and therefore, differentia¬ 
bility — structure, find very natural applications in areas of physics and mathematics 
in which symmetry and differentiability play important roles. Lie himself started 
the subject by analyzing the symmetry of differential equations in the hope that a 
systematic method of solving them could be discovered. Later, Emmy Noether ap¬ 
plied the same idea to variational problems involving symmetries and obtained one 
of the most beautiful pieces of mathematical physics: the relation between sym¬ 
metries and conservation laws. More recently, generalizing the gauge invariance 
of electromagnetism, Yang and Mills have considered nonabelian gauge theories 
in which gauge invariance is governed by a nonabelian Lie group. Such theories 
have been successfully built for three of the four fundamental interactions: elec¬ 
tromagnetism, weak nuclear, and strong nuclear. Furthermore, it has been possible 
to cast the fourth interaction, gravity — as described by Einstein’s general theory 
of relativity — in a language very similar to the other three interactions with the 
promise of unifying all four interactions into a single force. This chapter is devoted 
to a treatment of the first topic, application of Lie groups to DEs. The second topic, 
the calculus of variations and conservation laws, will be discussed in the next chap¬ 
ter. The third topic, that of gauge theories, although of fundamental importance to 
the development of physics and our understanding of the universe, is, at this stage, 
too specialized to be covered in this book. 


29.1 Symmetries of Algebraic Equations 

The symmetry group of a system of DEs is a transformation group that acts on both 
the independent and dependent variables and transforms solutions of the system to 






29.1 SYMMETRIES OF ALGEBRAIC EQUATIONS 937 


6-invariance and 
symmetry group 
defined 


system of algebraic 
equations and their 
symmetry group 


invariant map 


other solutions. In order to understand this symmetry gToup, we shall first tackle 
the simpler question of the symmetries of a system of algebraic equations. 

29.1.1. Definition. Let G be a local Lie group of transformations acting on a 
manifold M. A subset § C M is called G-invariant and G is called a symmetry 
group of % if whenever g • P is defined for P g S and g E G, then g - P e S. 

29.1.2. Example. Let M = E 2 . 

(a) Let G = E -1- be the abelian multiplicative group of real numbers. Let it act on M 
componentwise : r • (u) = (r jc ， r;y). Then any line going through the origin is a G-invariant 
subset of E 2 . 

(b) If G = 50(2) and it acts on M as usual, then any circle is a G-invariant subset of 

r 2 . m 

A system of algebraic equations is a system of equations 
Fy = 0, v 二 1 ， 2， ■. ■ ， /z ， 

in which : M E is smooth. A solution is a point x € M such that F v (x) = 0 
for v = 1， •. • ， n. The solution set of the system is the collection of all solutions. 
A Lie group G is called a symmetry group of the system if the solution set is 
G-invariant. 

29.1.3. Definition. Let G be a local Lie group of transformations acting on a 
manifold M. A map F : M — AT ，where N is another manifold, is called a 
G-invariant map if for all P e M and all g e G such that g • P is defined ， 
F(g-P) = F(P).A real-valued G-invariantfunction is called simply an invariant. 

The crucial property of Lie group theory is that locally the group and its alge¬ 
bra “look alike.” This allows the complicated nonlinear conditions of invariance 
of subsets and functions to be replaced by the simpler linear conditions of invari¬ 
ance under infinitesimal actions. From Definition 27.1.25, we obtain the following 
proposition. 

29.1.4. Proposition. Let Gbea local group of transformations acting on a man¬ 
ifold M. A smooth real-valued function f : M ^ is G-invariant if and only 

if 

Im1f(/)=0 for all PsM (29.1) 

and for every infinitesimal generator ^ € g. 

29.1.5. Example. The infinitesimal generator for 50(2) = xdy —yd x . Any func¬ 

tion of the form /“ 2 + y^) is an 50 (2)-invariant. To see this, we apply Proposition 29.1.4: 

(xdy- yS x )f(x 2 + ^ 2 ) = x(2y)f - y(2x)f f = 0, 

where f ! is the derivative of f. @ 



938 29. LIE GROUPS AND DIFFERENTIAL EQUATIONS 


The criterion for the invariance of the solution set of a system of equations is a 
little more complicated, because now we are not dealing with functions themselves, 
but with their solutions. The following theorem gives such a criterion (fora proof, 
see [Olve 86, pp. 82 - 83]): 

29.1.6. Theorem, Let Gbea local Lie group of transformations acting on an m- 
dimensional manifold M.Let F : M R n , n <m f define a system of algebraic 
equations {F v (jc) = 0}^ =1 , and assume that the Jacobian matrix {dF v /bx k ) is of 
rank n at every solution x of the system. Then G is a symmetry group of the system 
if and only if 


Vv whenever F v (x) = 0 (29.2) 

for every infinitesimal generator ^ e g. 

Note that Equation (29.2) is required to hold only for solutions x of the system. 

29.1.7. Example. Let M = E 2 and G = SO (2). Consider F : M M defined by 
F(x ， y) = x 2 -\-y 2 — 1. The Jacobian matrix is simply the gradient. 


(3F/dx,BF/dy) = (2x i 2y) 9 

and is of rank 1 for all points of the solution set, because it never vanishes at the points 
where F(x, y) =0, i.e.，the unit circle. It follows from Theorem 29.1.6 that Gisa symmetry 
group of the equation F(x, = 0 if and only if ^ir(^) ― 0 whenever r e 5 1 . But 


^Mlr(^) = (xS y - ^3 x )/ ?, |r = 2xy - 2yx = 0. 

This is a proof of the obvious fact that SO(2) takes points of S 1 to other points of 5 1 
Asa less trivial example, consider the function F :W 2 given by 

F{x, y) = x 2 y 2 + y 4 + lx 2 + / — 2. 


The infinitesimal action of the group yields 

= ( x h - y^x)F = ^ 3 y + ^ x y 3 - ^ x y = 2xy(x 2 -\-y 2 - 1). 

The reader may check that = 0 whenever F(x, y) = 0. The Jacobian matrix of the 

“system” of equations is the gradient 

VF — (2xy 2 + Ax, 2x 2 y + 4y 3 + 2y), 

which vanishes only when x = 0 = which does not belong to the solution set. Therefore, 
the rank of the Jacobian matrix is 1. We conclude that the solution set of F(x, y) = 0 is a 
rotationally invariant subset of R 2 . Indeed, we have 

F(x t y)= x 2 y 2 + / + 2jp 2 +/-2= (/ + 2)(x 2 + / - 1 )， 


and the solution set is just the unit circle. Note that although the solution set of F(x, >?) = 0 
is G-invariant, the function itself is not. 圃 





29.1 SYMMETRIES OF ALGEBRAIC EQUATIONS 939 


characteristic system 
ofaPDE 


We now discuss how to find invariants of a given group action. Start with a 
one-parameter group and write 


v Mm = # 


dx l 


for the infinitesimal generator of the group in some local coordinates. A local 
invariant F(x) of the group is a solution of the linear, homogeneous PDH 


! dF 

v(F) = X 1 U)—+ 


+ X n (x) 


SF 


0 . 


(29.3) 


It follows that the gradient of F is perpendicular to the vector v. Since the gradient 
of F is the normal to the hypersurface of constant F, we may consider the solution 
of Equation (29.3) as a surface F(x) = c whose normal is perpendicular to v. Each 
normal determines one hypersurface, and since there are n — 1 linearly independent 
vectors perpendicular to y, there must be n — 1 different hypersurfaces that solve 
(293). Let us write these hypersurfaces as 

F J (x\ = c j , j = 1 ， 2,…， ft - 1 ， (29.4) 


and note that 


n 

af j 



dF j A , ^ 

—— -Ax = 0, 
dx l 



A solution to this equation is suggested by (29.3): 


Ax 1 = aX l 



For Ax 1 dx\ we obtain the following set of ODEs，called the characteristic 


system of the original PDE, 

dx x _ dx 2 _ _ dx n 

XHx) = X 2 (x) = •• = x n ( x y 

whose solutions determine (x)}?:|. To find these solutions. 


(29.5) 


29.1.8. Box. Take the equalities of (29.5) one at a time，solve the first order 
DE, write the solution in the form of (29.4)，and read off the functions. 


The reader may check that any function of the J s is also a solution of the 
PDE. In fact, it can be shown that any solution of the PDE is a function of these 
(see [Olve 86, pp. 86-90]). 



940 29. LIE GROUPS AND DIFFERENTIAL EQUATIOWS 


29.1.9. Example* Once again, let us consider SO(2), whose infinitesimal generator is 
— + xdy. The characteristic “system” of equations is 

— =—xdx-\-ydy = 0 x 2 -\-y 2 = c. 

—y x 

Thus, F(x, y) =： x 2 y 2 , or any function thereof, is an invariant of the rotation group in 
two dimensions. 

As a less trivial example, consider the vector field 

where a is a constant. The characteristic system of ODEs is 
dx dy dz 

The first equation was solved above, giving the invariant F\(x, y y z) = y/x 2 -ty 2 = r. To 
find the other invariant, solve for x and substitute in the second equation to obtain 

dy _ dz 
y/r 2 - y 2 y/a 2 — 

The solution to this DE is 

y z y z 

arcsin- = arcsin- +C arcsin - arcsin- = C. 

r a r a 



Hence, F 2 (x, y, z) = arcsin (y/r) — arcsin(z/a) is a second invariant. By taking the sine of 
F 2 , we can come up with an invariant that is algebraic (rather than trigonometric) in form: 

s = sin F 2 = sin(a — p) = sin a cos 卢一 cos asmfi 

Any function of r and j 1 is also an invariant. 圈 

When the dimension of the Lie group is larger than one, the computation of 
the invariants can be very complicated. If forma basis for the infinitesimal 

generators, then the invariants are the joint solutions of the system of first order 
PDEs 

JL. Qf 

j=l 

To find such a solution, one solves the first equation and finds all its invariants. 
Since any function of these invariants is also an invariant, it is natural to express 
Fas a function of the invariants of Vi. One then writes the remaining equations in 
terms of these new variables and proceeds inductively. 
















29.2 SYMMETRY GROUPS OF DIFFERENTIAL EQUATIONS 941 


29.1.10, Example. Consider the vector fields 
u = -yd x -\-xdy, 

where a and 办 are constants. The invariants of u are functions of r = -y/x 2 + y 2 and z. If we 
are to have a nontrivial solution，the invariant of v as well as its PDE should be expressible 
in terms of r = -y/x 2 +y 2 and z. The reader may verify that 

v(F) = (r 2 z + a 3 )f - (rz 2 + 办 3 )f = 0 
dr dz 

with the characteristic equation 

dr dz 

r 2 z + a 3 rz 1 -}- 

This is an exact first-order DE whose solutions are given by 
^f 2 z 2 + a 3 z + b^r = c 

withe an arbitrary constant. Therefore, F — ^r 2 z 2 + a 3 z + b 3 r t or 

F(x t y t z) = \(x 2 4- y 2 )z 2 a 3 z -I- b^yjx 2 + y 2 t 
or a function thereof, is the single invariant of this group. 國 



29.2 Symmetry Groups of Differential Equations 

Let S be a system of partial differential equations involving p independent variables 
x = (x 1 ,..., x p ), and 夺 dependent variables u = (m 1 , ..., u q ). The solutions of 
the system are of the form m = /(jc),or,in component form, u a = f(x l 9 x p ), 
a — 1,. •., Let X = and U = 1R^ be the spaces of independent and depen¬ 
dent variables with coordinates {x f } and {u 01 }, respectively. Roughly speaking, a 
symmetry group of the system S will be a local group of transformations that map 
solutions of S into solutions of S. 


Marius Sophus Lie (1842-1899) was the youngest son of a Lutheran pastor in Norway. 
He studied mathematics and science at Christiania (which became Kristiania, then Oslo 
in 1925) University where he attended Sylow’s lectures on group theory. There followed a 
few years when he could not decide what career to follow. He tutored privately after his 
graduation and even dabbled a bit in astronomy and mechanics. 

A turning point came in 1868 when he read papers on geometry by Poncelet and Pliicker 
from which originated the inspiration in the topic of creating geometries by using elements 
other than points in space, and provided the seed for the rest of Lie’s career, prompting 
him to call himself a student of Pliicker, even though the two had never met. Lie’s first 










LIE GROUPS AND DIFFERENTIAL EQUATIONS 


publication won him a scholarship to work in Berlin, where he met Klein, who had also been 
influenced by Pllicker’s papers. The two had quite different styles ― Lie always pursuing 
the broadest generalization, while Klein could become absorbed in a charming special 
case — but collaborated effectively for many years. However, in 1892 the lifelong friendship 
between Lie and Klein broke down, and the following year Lie publicly attacked Klein, 
saying，“I am no pupil of Klein, nor is the opposite the case, although this might be closer to 
the truth.” Lie and Klein spent a summer in Paris, then parted for some time before resuming 
their collaboration in Germany. While in Paris, Lie discovered the contact transformation, 
which, for instance, maps lines into spheres. During the Franco-Prussian War, Lie decided 
to hike to Italy. On the way, however, he was anrested as a German spy and his mathematics 
notes were assumed to be coded messages. Only after the intervention of Darboux was 
Lie released, and he decided to return to Christiania. In 1871 Lie became an assistant at 
Christiania and obtained his doctorate. 

After a short stay in Germany, he again returned to Chris¬ 
tiania University, where a chair of mathematics was created 
for him. Several years later Lie succeeded Klein at Leipzig, 
where he was stricken with a condition, then called neuras¬ 
thenia, resulting in fatigue and memory loss and once thought 
to result from exhaustion of the nervous system. Although 
treatment in a mental hospital nominally restored his health, 
the once robust and happy Lie became ill-tempered and sus¬ 
picious, despite the recognition he received for his work. To 
lure him back to Norway, his Mends at Christiania created 
another special chair for him, and Lie returned in the fall of 
1898. He died of anemia a few months later. Lie had started examining partial differen¬ 
tial equations, hoping that he could find a theory that was analogous to Galois’s theory of 
equations. He examined his contact transformations considering how they affected a pro¬ 
cess due to Jacobi of generating further sohitions from a given, one. This led to combining 
the transformations in a way that Lie called a group, but is today called a Lie algebra. 
At this point he left his original intention of examining partial differential equations and 
examined Lie algebras. Killing was to examine Lie algebras quite independently of Lie, and 
Cartan was to publish the classification of semisimple Lie algebras in 1900. Much of the 
work on transformation groups for which Lie is best known was collected with the aid of a 
postdoctoral student sent to Christiania by Klein in 1884. The student, F. Engel, remained 
nine months with Lie and was instrumental in the production of the three volume work 
Theorie der Tran 对 ormationsgruppen 、which appeared between 1888 and 1893. A similar 
effort to collect Lie’s work in contact transformations and partial differential equations was 
sidetracked as Lie’s coworker, F. Hausdorff, pursued other topics. 

The transformation groups now known as Lie groups provided a very fertile area for 
research for decades to come, although perhaps not at first. When Killing tried to classify 
the simple Lie groups, Lie considered his efforts so poor that he admonished one of his 
departing students with these words: “Farewell，and if ever you meet that s.o.b.，kill him.” 
Lie’s work was continued (somewhat in isolation) by Cartan, but it was the papers of Weyl 
in the early 1920s that sparked the renewal of strong interest in Lie groups. Much of the 
foundation of the quantum theory of fundamental processes is built on Lie groups. In 1939, 






29.2 SYMMETRY GROUPS OF DIFFERENTIAL EQUATIONS 943 


transform the graph 
of a function to find 
the function’s 
transform! 


transform of a 
function by a group 
element 


Wigner showed that application of Lie algebras to the Lorentz transformation required that 
all particles have the intrinsic properties of mass and spin. 


To make precise the above statement, we have to clarify the meaning of the 
action of G on a function u = f(x). We start with identifying the ftmction / (i.e., 
a map) with its graph (see Chapter 0), 

r f = {(x, f(x))\x eQ}cXxU, 

where 卩 C X is the domain of definition of /. If the action of g G G on T/ is 
defined, then the transform of T/ by ^ is 

g • 17 = { ( 无， w) = g • (x 9 u) I (x,u) e r/}. 

In general, g * T/ may not represent the graph of a function — in fact, it may not 
be even a function at all. However, by choosing g close to the identity of G and 
shrinking the size of fi，we can ensure that g-Ff = Tp i.e.，that g • r f is indeed 

the graph of a function u = f (x). We write / = g ■ / and call / the transform 
of / by g. 

29.2.1. Example. Let X = R = U, so that we are dealing with an ODE. Let G = SO(2) 
be the rotation group acting onX x U = M 2 . The action is given by ' 

( 无， ii) = 0 • (x, u) = (x cos 0 — m sin 0, a: sin 0 + « cos 6). (29.6) 

If m = f(x) is a function, the group SO (2) acts on its graph Fj by rotating it. This process 
can lead to a rotated graph 0 - Ff, which may not be the graph of a single-valued function. 
However, if we restrict the interval of definition of /, and make 0 small enough, then 9-Tf 
will be the graph of a well-defined function u = f(x) with = 0 - Tf .If we substitute 
f(x) for w，we obtain 

(jc, k) = 0 • (x 9 f(x)) = (xcos^ — f(x) sin 0， x sin 沒 + f(x) cos^), 
or 


x =xcos0 — f(x) sin^, 

u = xsmO -f(x)cosO. (29.7) 


Eliminating x from these two equations yields u in terms of from which the function / 
can be deduced. 


As a specific example, consider f(x) = kx\ Then, the first equation of (29.7) gives 


(^sin^)x 2 — cos^ + x = 0 x 


cosO — y/ cos 2 0 — 4kx sin 0 


2k sin 0 


where we kept the root of the quadratic equation that gives a finite answer in the limit 
0 — 0. Inserting this in the second equation of (29.7) and simplifying yields 


cos^-yco S 2^-4M s in^ 

2A:sin 2 ^ 




944 29. LIE GROUPS AND DIFFERENTIAL EQUATIONS 


projectable group 


symmetry group of a 
system of DEs 


We write this as 


f(x) = (9 - f)(x) 


cos 0 — y/ cos 2 0 — 4kx sin 0 


x cotO. 


2k sin 2 0 

This equation defines the function f = 0 • f. ■ 

The foregoing example illustrates the general procedure for finding the trans¬ 
formed function / = g ■ /: 


29.2.2. Box, If the rule of transformation of g € G is given by 

(x, u) = g' (x, u) = w)), 

then the graph = g • T/ ofg - f is given parametrically by 

X = %(x ， f(x)), u = ^g(x, f{x)). 

(29.8) 

In principle, we can solve the first equation for x in terms of x and substitute 
in the second equation to find m in terms x，and consequently /. 

For some special but important cases, the transformed ftinctions can be obtained 
explicitly. If G is projectable, i.e” if the action of G on does not depend on 
u, then Equation (29.8) takes the special form 无 = ^ g (x) and u = ^ g (x, f(x)) 
in which ^ is a diffeomorphism of X with inverse 屮容 ― i. If T/ is the graph of a 
function /, then its transform g ■ T is always the graph of some function. In fact ， 

U = hx) = <P g (x, f(x)) = /(^-i(x))). (29.9) 

In particular, if G transforms only the independent variables, then 

u = m = f(x) = f(^ g -i(x)) => f = fo^ g .u (29.10) 

For example, if G is the group of translations x\-^ x-\-a, then the transform of / 
will be defined by f(x) — fix — a). 

29.2.3. Definition. A symmetry group of a system of DEs S is a local group of 
transformations G acting on an open subset M of X x U with the property that 
whenever u = f(x) is a solution ofS and f = g • f is defined for g e G r then 
u = f{x) is also a solution of§. 

The importance of knowing the symmetry group of a system of DEs lies in the 
property that from one solution we maybe able to obtain a family of other solutions 
by applying the group elements to the given solution. To find such symmetry 
groups, we have to be able to “prolong” the action of a group to derivatives of the 
dependent variables as well. This is obvious because to testa symmetry, we have 
to substitute not only the transformed function u = f(x), but also its derivatives 
in the DE to verify that it satisfies the DE. 





29.2 SYMMETRY GROUPS OF DIFFERENTIAL EQUATIOWS 945 


/ 7 -equivalence of 
functions 


nth jet space of U 


Note that the “points” 
of J n (X x U) are 
[/-valued functions! 


29.2.1 Prolongation of Functions 

Given a function f :X there are 

Pk = 

different derivatives of order k of /. We use the multi-index notation 
Sjf(x )= 

for these derivatives, where J = (ji,...» jk) is an unordered 尧 -tuple of nonnega¬ 
tive integers with I < jk < p (see also Section 21.1). The order of the multi-index 
J, denoted by |7|, is the sum of its components and indicates the order of differ¬ 
entiation. So, in the derivative above, |/[ = h +- jk =k. For a smooth map 

/ : X — f /，we have f(x) = (y* 1 (:c )， ■ • • ， / 疗 00 )，so that we need 《 . 外 numbers 
3 / 尸 00 to represent all k-th order derivatives of all components of /. 

To geometrize the treatment of DEs (and thus facilitate the study of their 
invariance), we need to construct a space in which derivatives of all orders up to a 
certain number n participate. Since derivatives need functions to act on, we arrive 
at the space of functions whose derivatives share certain common properties. To 
be specific，me make the following definition. 

29.2.4. Definition. Let f and h be functions defined on a neighborhood of a 
point a e X with values in U. We say that f and h are n-equivalent at a if 
8jf a (a) = Bjh a (a) for all a and J corresponding to all derivatives up to nth 
order. The collection of all U-valued functions defined on a neighborhood of a 
will be denoted by r fl (X x U) f and all functions n-equivalent to f at a by 

A convenient representative of such equivalent fiinctions is the Taylor poly¬ 
nomial of order n (the terms in the IViylor series up to «th order) about a. Now 
collect all 7 ^/ for all a and /, and denote the result by J n (X x U), so that 

J n (XxU) = {j^f\Wa€X and V / € T a (X x C/)}. (29.11) 

J n (X x U) is called the nth prolongation oft/, or the nth jet space oft/. It turns 
out that J n (X x ") is a manifold (see [Saun 89, pp. 98 and 199]). 

29.2.5. Theorem. J n (X x U) is a manifold with natural coordinate functions 
(x l , u a . My) defined by 

/( 人"/)=夂 u a <xf、= rw ’ #( 冗 /)= 〜尸 ⑷. 

The natural coordinate functions allow us to identify the space of the derivatives 
with various powers ofM. Let Uk = ^- qPk denote the set of coordinates uj, and let 


dxh dxh … Sx^ k 



946 29. LIE GROUPS AND DIFFERENTIAL EQUATIONS 


prolongation of a 
function 


(/(«) = u x U\ x • • ■ x t/ rt be the Cartesian product space 1 whose coordinates 
represent all the derivatives Uj of all orders from 0 to n. The dimension of is 

g qpi qpn = q(j ^ = 叩⑻ • 

A typical point in U ⑻ is denoted by u^ n \ which has 分 p ⑻ different components 
{u°j} q a ^ v where J runs over all unordered multi-indices J = (ju …， jk) with 
l < jk < p and 0 < k < n. By convention, k = 0 refers to no derivatives at 
all, in which case we set J = 0 and = u a . The nth jet space J n (X x U) can 
now be identified with X x [/ ⑻. From now on, we shall use X x in place of 
J n (X x U). 

29.2.6. Example. Let p = 3 and « = 1, i.e., = E 3 and U =R. The coordinates of X 

qig(x, y t z) and that of U is u. The coordinates of U\ are (u Xi u y , u z ) 9 where the subscript 
denotes the variable of differentiation. Similarly, the coordinates of U2 are 

(^xxi ^xy> u xzf u yy^ u yz* u zz) 

and those of £/( 2 ) = U x U\ x U 2 

(m; u x , Uy, u z \ u X xi u xyt u xz^ u yyi u yz^ M zz)» 

which shows that is 10-dimensional. M 

29*2.7. Definition. Given a smooth map f \ X D Q ^ U, we define a map 
pr ⑻ / : ^2 f / ⑻ whose components (pr^/)y are given by 


This map is called the nth prolongation of f. 

Thus, for each x G X, prefix) is a vector in R qp{n) whose components are 
the values of / and all its derivatives up to order n at the points. For example, in 
the case of /? = 3, ^ = 1 discussed above, pr( 2 )/(x ， y, z) has components 

(d£ Sf df a 2 / a 2 / d 2 f S 2 f a 2 / s 2 f\ 

\ } dx' dy 1 Sz' 9x 2 ， BxBy' dxdz' dy 2 ? Sydz' Bz 2 / 


When the underlying space 
space is 


is an open subset M of X x U, its corresponding jet 


M ⑻ =M xUix^-xU n , 

which is a subspace of X x Z7 ⑻ J n (X x U). If the graph of f : X ^ U lies 

in M, the graph of pr( n )/ lies in 


^ote that C/, identified with the space of zeroth derivative, is a factor in U^ n \ 







29.2 SYMMETRY GROUPS OF DIFFERENTIAL EQUATIONS 947 


Prolongation allows us to turn a system of DEs into a system of algebraic 
equations: Given a system of l DEs 

△v ({? } ， {w a } ， {9/w a }, {di^jU 01 } ， … ， { 知 … di n u a }) = 0， v = 1 ，…，/， 

one can define a map A : M ⑻ — ㈣ and identify the system of DEs with 

§ A = {(jc, m ( w) ) g I A (x, u^) = 0}. 

By identifying the system of DEs with the subset §△ of the jet space, we have 
translated the abstract relations among the derivatives of u into a geometrical 
object §△，which is more amenable to symmetry operations. 

29.2.8. Definition. Let Qbea subset ofX and f : Q — U a smooth map. Then 
f is called a solution of the system of DEs §a if 

A(jc,pr (n) /W) = 0 'i xe Q. 

Just as we identified a function with its graph, we can identify the solution of 
a system of DEs with the graph of its prolongation pr ⑻ /• This graph, which is 
denoted by r，), will clearly be a subset of § △: 

rf = {(x,pr< n >/(x))}cSA. 


29.2.9. Box. An nth order system of differential equations is taken to be 
a subset Sa of the jet space J n (X x U) t and a solution to be a smooth 
map f : Q ^ J n (X x U) the graph of whose nth prolongation pr ⑻ / & 
contained in Sa- 


29.2.10. Example* Consider Laplace’s equation 

▽2j£ = 汉 u + ^yy ^zz = ^ 

with p = 3, q = 1 ， and n = 2. The total jet space is the 13-dimensional Euclidean space 
X x £/( 2 ), whose coordinates are taken to be 

(x, y, z\ m ； u x , w-y, u z ; u xx , u X y, w X z ， w xy ， u yz^ u zz)- 

In this 13-dimensional Euclidean space, Laplace’s equation defines a 12-dimensional sub- 
space §△ consisting of all points in the jet space whose eighth, eleventh, and thirteenth 
coordinates add up to zero. A solution /: K? Q U C M must satisfy 

dx 2 dy 2 dz 2 



948 29. LIE GROUPS AND DIFFERENTIAL EQUATIONS 


/rth prolongation of a 
gro 叩 action 


This is the same as requiring the graph T ’、to lie in - For example, if 

/(r) = f(x 9 y t z) = x 3 yz- y 3 xz + / - 3yz 2 , 

then，collecting each section of pr( 2 ) / with fixed a (see Definition 29.2.7) separately, we 
have 

(P 「 (2) /)V) = x 3 yz- y 3 xz + y 3 - 3yz 2 , 

(P 「 ( 2 ) /) 2 (r) = (3x 2 yz-y 3 z,x 3 z- 3y 2 xz + 3y 2 - 3z 2 , x^y - y^x - 6yz ), 

(pr ⑶ /) (r> = (6xyz ， 3x 2 z- 3y 2 z, 3x 2 y - y 3 , -6xyz H- 6y, x 3 - 3xy 2 - 6z t -6y) 

which lies in §△ because the sum of the eighth, the eleventh, and the thirteenth coordinates 
of (x, y, z ； P 「 ( 2 )/U ， ： v ， z)) is_ 6 巧之 一 6xyz + 63 ^ — 6 ^ = 0. 团 


29.2.2 Prolongation of Groups 


Suppose G is a group of transformations acting on M C X x 17. It is possible to 
prolong this action to the «-jet space M ⑻. The resulting group that acts on 
is called the nth prolongation of G and denoted by pr (n) G with group elements 
pr( rt )g，for g e G. This prolongation is defined naturally: The derivatives of a 
function / with respect to x are transformed into derivatives of / = g • f with 
respect to x. More precisely, 

Pr ⑻ g ‘ (aV) = Prefix)) ^ (x, pr ⑹ (g . f)(x)) . | (29.12) 

For n = 0, Equation (29.12) reduces to the action of G on M as given by Equation 
(29.8). That the outcome of the action of the prolongation of G in the defining 
equation (29.12) is independent of the representative function / follows from the 
chain rule and the fact that only derivatives up to the nth order are involved. The 
following example illustrates this. 

29.2.11. Example. Let X, U, and G be as in Example 29.2.1. In this case, we have 
pr ⑴卩 - (jxf) = pr (1) 0 - (x t u,ui) = (x, m, mi). 


We calculated x and u in that example. They are 
x = x cos 0 — usmO, 

u = xsinG -{-u cosO. (29.13) 


To find we need to differentiate the second equation with respect to x and express the 
result in terms of the original variables. Thus 




dii 

~di 


du \ dx 


au ax / , ^ au \ 

T x Tx = \^°^T x ^ e ) ^ 


(sin^ + u\ cos 沒 ) 


dx 

dx 


dx/dx is obtained by differentiating the first equation of (29.13): 


dx du ■ 内 

~cos0 一一 —sm.6 
dx dx 


dx ^ du dx . △ dx 

— cosO 一 - —— sinO = (cos 沒 - u\ sin0) — 
dx dx dx dx 







29.2 SYMMETRY GROUPS OF DIFFERENTIAL EQUATIONS 949 


or 


dx 1 

• - =： ---^ 

dx cos0 — wj sin 0 ' 

It therefore follows that 

— sin0 + mi cos0 

u\ = -- 

cos0 — u\ sind 

and 

pr ⑴ 0 • (x ， m ， mi) = (xcosG — usmO 

We note that the RHS involves derivatives up to order one. Therefore, the transformation 
is independent of the representative function. So, if we had chosen j\h where h is 1- 
equivalent to /， we would have obtained the same result. This holds for derivatives of all 
orders. Therefore, the prolongation of the action of the group G is well-defined. M 

In many cases, it is convenient to choose the rath-order Taylor polynomial as 
the representative of the class of «-equivalent functions, and, if possible, write the 
transformed function / explicitly in terms of x, and differentiate it to obtain the 
transformed derivatives (see Problem 29.3). 

Example 29.2.11 illustrates an important property of the prolongation of G. 
We note that the first prolongation pr ⑴ (7 acts on the original coordinates (x, u) in 
exactly the same way that G does. This holds in general: 


,a: sin ^ + m cos0. 


sin0 + mi cos^\ 


cos 沒一 mi sin 沒 


) 


29.2.12. Box. The effect of the nth prolongation pr^G to derivatives up to 
order m <nis exactly the same as the effect o/pr( w )G. If we already know 
the action of the mth-order prolonged group pr( m ) G, then to compute pr( n ) G 
we need only find how the derivatives u°j of order higher thanm transform, 
because the lower-order action is already determined. 


29.2.3 Prolongation of Vector Fields 

The geometrization of a system of DEs makes it possible to use the machinery of 
differentiable manifolds, Lie groups, and Lie algebras to unravel the symmetries 
of the system. At the heart of this machinery are the infinitesimal transformations ， 
which are directly connected to vector fields. Therefore, it is necessary to find out 
how a vector field defined in M C X x [/ is prolonged. The most natural way to 
prolong a vector field is to prolong its integral curve — which is a one-parameter 
group of transformations of M — to a curve in Af ⑻ and then calculate the tangent 
to the latter curve. 




29. LIE GROUPS AND DIFFERENTIAL EQUATIONS 


29.2.13. Definition. Let M be an open subset ofX x U andX e X(M). The nth 
flth prolongation of a prolongation ofX f denoted by pr ⑻ X, & a vector field on the nth jet space M (n ) 
vector field defined by 

P r ° l)x kx,uW) = ipr ⑻ [expGX)] ■ (x ， M ⑷） 

7 at r=o 

for any m ⑻） e Af ⑷. 

Since (x, u^) € form a coordinate system on M( w )，any vector field in 
M( n ) can be written as a linear combination of B/dx l and d/du a j with coefficients 
being, in general, functions of all coordinates x l and Mj. For a prolonged vector, 
however, we have 




(29.14) 


where X 1 and Xq are fiinctions only of and m . This is due to the remark made in 
Box 29.2.12. For the same reason, the coefficients Xj corresponding to derivatives 
of order m will be independent of coordinates Uj that involve derivatives of order 
higher than m. Thus, it is possible to construct various prolongations of a given 
vector field recursively. 

29.2.14. Example. Let us consider our recurrent example of X = U = R, G = SO(2). 
Given the infinitesimal generator v = —ud x + xd U9 one can solve the DE of its integral 
curve to obtain 2 


exp(?v)(j[：, u) = (x cos r — w sin r, a: sin / + w cost). 
Example 29.2.11 calculated the first prolongation of 50(2). So 


m , 、〆 、 / . • sm? -{- mi cosr\ 

pr } exp(rv) * (x,u t u\) = ( x cos t — usmt f x smt -\- u cost, - ) • 

V cos? — u\ sin^/ 


Differentiating the components with respect to r at f = 0 gives 
d 

—(x cos f — w sin t) = —u, 
at t =0 


—(x sin / + m cos t) = jc, 

ot lf=0 

3 /sinf -buicost\\ , 


a /sinr i- cosf \ i 
dt \cost — u\ sinf/ 


1 + «1- 


Therefore, 


Pr V = -^ + ^ + (1 + 


du\ 


Note that the first two terms in pr^^v are the same as those in v itself, in agreement with 
Box 29.2.12. 翻 


2 One can, of course, also write the finite group element directly. 



29.3 THECENTRALTHEOREWIS 951 



29.3 The Central Theorems 


We are now in a position to state the first central theorem of the application of Lie 
groups to the solution of DEs. This theorem is the exact replica of Theorem 29.1.6 
in the language of prolongations: 

29.3.1. Theorem. Let 

A v ( jc ， m ⑻） = 0， v = 1， … ，/， 


be a system of DEs defined on M G X x U whose Jacobian matrix 


(jc ， m ⑻） = 


9A V 3A V \ 


has rank l for all (jc, u^) e local Lie group of transformations acting 

on M, and 

P「 ⑻⑷） = 0, v = 1 ,whenever A (x, m ⑷ ）= 0 

for every infinitesimal generator ^ of G, then G is the symmetry group of the 
system. 

29.3.2. Example. Consider the first order (so, n = l) ordinary DE 
A(jc, U, Ml) = (u — — u 2 x)u\ + X + X 2 U + M 3 = 0, 


so that X = R and U =M. We first note that 
/3A 9A 3A\ 

V 3^: ? du 1 dui / 

=(( — 3x^ 一 u^)ui + 1 + 2xu t (1 — 1ux)u\ + + 3u^, u — 一 


which is of rank 1 everywhere. Now let us apply the first prolongation of the generator of 
50(2) ― calculated in Example 29.2.14 — to A. We have 


pr ⑴ v(A)= 


BA 

一 f 


3A 

du 


+ (1 + M 子 ) 


8A 
3m i 


=- m [{- 3 x 2 — u 2 )u\ + 1 + 2xu] + x[{\ - 2ux)ui +x 2 3 m 2 ] 

+ (1 + M j)(m 一 一 

= U\[{u-X^ —U 2 x)U\ + X + X 2 « + M 3 ] = Ml A. 


It follows that pr(i)v(A) = 0 whenever A = 0, and that 50(2) is a symmetry group of the 
DE. Thus, rotations will change solutions of the DE into other solutions. In fact, the reader 
may verify that in polar coordinates, the DE can be written in the incredibly simple form 


dr 

dO 


r 


and the symmetry of the DE conveys the fact that adding a constant to 0 does not change 
the polar form of the DE, S 



952 29. LIE GROUPS AND DIFFERENTIAL EQUATIONS 


Theorem 29.3.1 reduces the invariance of a system of DEs to a criterion in¬ 
volving the prolongation of the infinitesimal generators of the symmetry group. 
The urgent task in front of us is therefore to construct an explicit formula for the 
prolongation of a vector field. In order to gain insight into this construction, we 
first look at the simpler case of C/ = E and a group G that transforms only the 
independent variables. Furthermore, we restrict ourselves to the first prolongation. 
An infinitesimal generator of such a group will be of the form 





3 


dx l 


which is assumed to acton the space M C X xU. The integral curve of this vector 
field is exp(fv) which acts on points of M as follows: 

(x, u) = exp(/v) - (x, u) = (^(jc),m) = (^(jc), u). 

By the construction of the integral curves in general, we have 

(29.15) 
t=o 

Denote the coordinates of the first jet space by (jc* , u, Uk) y where 
Uk(jxf) ^ df/dx k . By the definition of the action of the prolonged group, 

pr (1 )exp (⑺. {j\f) = u,Uj), (29.16) 

where u — f(x) and iij = Bf/3x J . Once we find My, we can differentiate Equation 
(29.16) with respect to r at r = 0 to obtain the prolonged vector field. Using 
Equation (29.10) and commas to indicate differentiation， 3 we obtain 


d^ l t (x) 

dt 


^ = /»(/ 。寧•⑻ = j ( x ). 

Since v does not have any component along d/Bu, its prolongation will not have 
such components either. The components Uj along d/duj are obtained by differ- 


3 We have found it exceedingly convenient to use commas to indicate differentiation with respect to the x’s. The alternative, 
i.e” the use of partials, makes it almost impossible to find one’s way in the maze of derivatives involving x i , 5 人 and 。 with 
depending on t. The reader will recall that the index after the comma is to be thought of as a “position holder，” and the argument 
as a substitution. Thus, for example, 





_ 9/(^1，..., Jp) 

3/(^1，...,^) 


r=x ds j 

H j 


^>=x 




29.3 THE CENTRAL THEOREMS 953 


entiating uj with respect to t: 


Uj(x, u,Uj) = ^ui 


3t 


，=n / =1 k=l \ 


The derivative of the first term in the sum can be evaluated as follows: 




⑽ )） 




m-s)) 






(29.17) 


靖 ) 

紳 i / 
-^(x) 


s^ 1 . 

^ x) 


ds 


w = —x z >)， 


15=0/J 


where we have emphasized the dependence of ^ on f (or s), treated s as the first 
independent variable, and in the second line substituted s = 0 in all x's before 
differentiation. This is possible because we are taking the partial derivative with 
respect to the first variable holding all others constant. The derivative of the second 
term in Equation (29.17) can be calculated similarly: 


k ⑽ ))1,: 


a V 


because = x l . We therefore have 

P 

Uj(x ， u ， uk) = - ~r~r u i 


(29.18) 


/n , d a \ ^ dx k 

P^v = E(^ + ^)=v-E^T 


(29.19) 


It is also instructive to consider the casein which U is still R, but G acts only 
on the dependent variable. Then v = U(x, u)S Ui and 

d^tix, u) 

(x, u) = (x, AOc, m)), with U(x,u) = ―—— • 

at f=o 

The reader may check that in this case, the prolongation of v is given by 


p P) v = v + f Uj(x ， u ⑴) = ~ +Uj ~ 


(29.20) 



954 29. LIE GROUPS AND DIFFERENTIAL EQUATIONS 


total derivative 


The second equation in (29.20) can also be written as 


邮， pr ⑴ /W) 


W W 3 / d 


dx J 9u Qx J dx J 


dUix, 


In other words, Uj(x, m ⑴） is obtained from U(x,u) by differentiation with respect 
to x 、while treating m as a function ofx. This leads us to the definition of the total 
derivative. 

29.3.3. Definition. Let S : Af ( tt ) M. be a smooth function of x, u, and all 
derivatives of u up to nth order. The total derivative of S with respect to x\ 
denoted by DfS, is a smooth function DfS : — R defined by 

DiS(j^ x f) = ^[S(x 9 prefix))]; 
ox 


Le” DiS is obtained from S by differentiating S with respect to x l , treating u and 
all the u a /s as functions ofx. 

The following proposition, whose proof is a straightforward application of the 
chain rule, gives the explicit formula for calculating the total derivative: 

29.3.4. Proposition. The ith total derivative of S : M. is of the form 


DiS 


BS 

dx l 


1 

EE 


3S 


u 




du 


3 


where ， for J = ( 力， … ，九)， 
a _ 3u a j _ 

and the sum over J includes derivatives of all orders from Oton. 

An immediate consequence of this proposition is 

^ n du a j 4 

DiUj — Uj i = ^ V */ ， a, 

Di(ST) = TDiS + SDtT. (29.21) 

Higher-order total derivatives are defined in analogy with partial derivatives: If I 
is a multi-index of the form / = (z*i,, •, ，^)， then the /th total derivative is 

D h A 2 • • • D ik S. (29.22) 


As in the case of partial derivatives, the order of differentiation is immaterial. 

We are now ready to state the second central theorem of the application of Lie 
groups to the solution of DEs (fora proof, see [Olve 86, pp. 113-115]). 



29.3 THECENTRALTHEOREMS 955 


29.3.5. Theorem. Let 


v = 亡 / w ) A + 


be a vector field ori an open subset M C X x U. The nth prolongation of 、， i.e” 
pr ⑻ v e is 


pr ⑹ v = v + 亡"沖， w(7 °)^， 


Q !： 


where for J == (ju … ， jO e inner sum extends over 1 < |7| £ n and the 
coefficients UJ are given by 


P 


P 售 


and the higher-order derivative D j is as given in Equation (29.22). 

29.3.6. Example. Let p =： 2, = 1, and consider the casein which G acts only on the 

independent variables (x, y). The general vector field for this situation is 


Y 


Hx ^ y) r x +ri ^ y) ^ 


We are interested in the first prolongation of this vector field. Thus, n = l 9 X l = $,X = rj t 
and J has only one component, which we denote by j (also written as x or y). Theorem 
293.5 gives 


du 


: du 




^ SX 1 3u 




dx j dx j 


and using the notation u x = du/dx and u y = du/dy, we obtain 


pr (1) v = v-i-t/j：- - \-XJy 


du x ' 1 dUy ， 


(29.23) 


where 

Ux 


9? 叫 

Ux - Mv ~***" 

dx y dx 


Uy = - Ux d A- Uy d A 


In particular, if G = 50(2), so that v = —yd x + ^3^, then U x 
then follows that 


_Uy and U y = u x . It 


pr(1)v= \ + %— 


du 




29. LIE GROUPS AND DIFFERENTIAL EQUATIONS 


29.3.7. Example. Let 尸 = 1 ，殳 = 1， and G = 50(2). The general vector field for this 
situation is 

d a 

v = -w—+J：—. 
ox au 

For the first prolongation of this vector field n = 1, X 1 = —u 9 and J has only one 
component, which we denote by x. Theorem 29.3.5 gives 


^j=^x— Ac ( 文 — 


1 


It follows that 


p r (l)V = —u —— 

dx 


3 9 d 

+ x _ +(1 + %) _ 


which is the result obtained in Example 29.2.14. 

The second prolongation can be obtained as well. Once again we use Theorem 29.3.5 
with obvious change of notation: 


^xx = — X^iix) + X^Uxxx = D x (l -\- uu xx ) — uu X xx = ^u x u xx . 


Then 




(1+ ^i +3w ^ 


Using Theorem 29.3.1, we note that the DE u xx = 0 has S0(2) as a symmetry group, 
because with A(^:, u, u x , u xx ) = u X x, 


pr (2 >v(A) = -u^ + + :笋 + 3 

ox du 3u x 3u XJl 


^UxUxx, 


which vanishes whenever A(j:, u, u x , u xx ) vanishes. This is the statement that rotations 
take straight lines to straight lines. B 


defining equations 
for the symmetry 


29.4 Application to Some Known PDEs 

We have all the tools at our disposal to compute (in principle) the most general sym- 
metiy group of almost any system of PDEs. The coefficients Uj of the prolonged 
vector field pr ⑻ v will be functions of the partial derivatives of the coefficients X 1 
and U a of v with respect to both x and u. The infinitesimal criterion of invariance 
as given in Theorem 29.3.1 will involve jc, u, and the derivatives of u with respect 
to x, as well as X 1 and U 01 and their partial derivatives with respect to x and u. 
Using the system of PDEs, we can obtain some of the derivatives of the w s s in 
terms of the others. Substituting these relations in the equation of infinitesimal 
criterion, we get an equation involving u's and powers of its derivatives that are 
to be treated as independent. We then equate the coefficients of these powers of 


group o a system p ar tial derivatives of u to zero. This will result in a large number of elementary 



29.4 APPLICATION TO SOME KNOWN PDES 957 


PDEs for the coefficient functions X 1 and U a of v, called the defining equations 
for the symmetry group of the given system of PDEs. In most applications, these 
defining equations can be solved, and the general solution will determine the most 
general infinitesimal symmetry of the system. The symmetry group itself can then 
be calculated by exponentiation of the vector fields, i.e” by finding their integral 
curves. In the remaining part of this section, we construct the symmetry groups of 
the heat and the wave equations. 


29.4,1 The Heat Equation 

The one-dimensional heat equation Ut = u X x corresponds to p = 2 9 q = l t and 
n = 2. So it is determined by the vanishing of A(x, t, m( 2 )) =u t — u xx .Thc most 
general infinitesimal generator of symmetry appropriate for this equation can be 
written as 


v 


= u)^- + T(x,t, M)^+ u )^ 9 


(29.24) 


which, as the reader may check (see Problem 29.11), hias 过 second prolongation 
of the form 


pr (2) v = v + 

ou x au t 0U XX du xt ou t t 
where, for example, 

= + (<Pu T t )Ut — — 

<t> XX = ^>xx + (2<j) xu — ^xx)^x — ^xxUt + (0MU — 2^ xu )u\ 

— ^xu^x^t — — ^uu^x u t (29.25) 

+ (0u - 一 ^x^xt "" — T u UtU xx 一 2T u U x 'U x t ， 


and subscripts indicate partial derivatives. Theorem 29.3.1 now gives 

pr (2) v(A) = (^ 一 <p xx = Q whenever u t = u xx (29.26) 


as the infinitesimal criterion. Substituting (29.25) in (29.26), replacing u t with u xx 
in the resulting equation, and equating to zero the coefficients of the monomials 
in derivatives of u, we obtain a number of equations involving r, and 沴 . These 
equations as well as the monomials of which they are coefficients are given in 
Table 29.1. Complicated as the defining equations may look, they are fairly easy to 
solve. From (d) and (f) we conclude that r is a function of t only. Then (c) shows 
that ? is independent of w, and (e) gives 2^ x = r ti or ^(x, t) = ^r t x + rj(t), for 
some arbitrary function r\. From (h) and the fact that § is independent of u we get 
= 0， or 


<j>(x ， t,u) = a(x, t)u + t) 


958 29. LIE GROUPS AND DIFFERENTIAL EQUATIONS 


u xx 

u\u xx 

u x u xx 
^x^xt 
u xx 
^xt 

u l 

2 


U x 
U X 


0 = 0 
【WK = 0 

= 0 

2t h = 0 

2 ^ + T xx — T t = 0 
=0 
^uu = 0 

— = 0 

与 XX — ^<t>xu — 心 = 0 

♦t -(/>xx =0 


Table 29.1 The defining equations of the heat equation and the monomials that give rise 
to them. 


for some as yet undetermined functions a and p. Since § is linear in x, ^ xx = 0, 
and ⑴ yields ^ = -2(j) xu = —2a Xf or 

盖 = -私 =-\t tt x - \ie]t a{x,t) = -\t u x 2 - 5^x4- p(t). 

Finally, with (j> t = a t u-\~p t and (j) xx = a xx u + p xx (recall that when taking partial 
derivatives, u is considered independent of x and t), the last defining equation (j) 
gives 0 L t = a xx and = p XXj i.e” that a and P are to satisfy the heat equation. 
Substituting a in the heat equation, we obtain 

— \xtttx 2 — \r} tt x + A = 

which must hold for all x and t. Therefore, 

- 0, ritt = 0, p t = 

These equations have the solution 

T = c\t 2 + C2t + C2, p - -\c\t + C4, 7] = C5t + C6. 

It follows that a(x t t) = —\c\x 2 — \c^x — ^c\t +C 4 and 

0 = ^(2c\t + C2)x + C5t + C 6 , 
r(Z) = C\t 2 + C2t +C 3 , 

♦ (X’t’ll) = {-\c\X 2 - \c^X - \c\t + C 4 ) “ + 


Monomial Coefficient Equation 


⑻ (b)(c) ⑻ (e)(f)3 ⑻ (i)s 




29.4 APPLICATION TO SOME KNOWN PDES 


Inserting in Equation (29.24) yields 

, a 9 a 

V = ( 2 ci/ + C 2 )x + c 5 t + c 6 ] — + (Cir+ C2/+C3) — 

+ ~ 5C5X — \c\t H- C4) u + t)] — 

=ci 备 - 》( 2 ,+ x 2 )u ^b c2 i^ x ~L +t \ 


d 3 / 3 ! d\ d 

+ C3 石 +C4 % +C5 ($ — 


Thus the Lie algebra of the mfinitesimal symmetries of the heat equation is spanned 
by the six vector fields 


(29.27) 


Vi =9^：, V 2 = 3f, V 3 = ud u ， V 4 = + 2td t ， 

V 5 = 2td x — xud Uf Vg = 4txdx 4-t^dt — (x^ + 2/)m3 k 

and the infinite-dimensional subalgebra 


v 芦 =p(^x 9 t^S u , 

where 卢 is an arbitrary solution of the heat equation. 

The one-parameter groups Gi generated by the can be found by solving the 
appropriate DEs for the integral curves. We show a sample calculation and leave 
the : rest of the computation to the reader. Consider V 5 , whose integral curve is given 
by the set of DEs 


It' f = 0 , 

as 


—xu. 


The second equation shows that t is not affected by the group. So, t ― to, where 
to is the initial value of t. The first equation now gives 

dx 

~~ = ^ x —— 之切 *? + 文 0 ， 


and the last equation yields 

—=-(2f 0 5+^o)m — = -(2t 0 s + x 0 )ds ^ u = uoe~ tos2 ~ x ° s . 

as u 

Changing the transformed coordinates to x l and removing the subscript from the 
initial coordinates, we can write 


exp(v 5 ^) - (x, t t u) = (Jc, ?, u) = (x + 2ts, t z ue 


■SX—S 2 t' 


Table 29.2 gives the result of the action of exp(v^) to (x, t, u). 





29. LIE GROUPS AND DIFFERENTIAL EQUATIONS 



Group Element Transformed Coordinates (x, t, u) 


exp(vi^) 

exp(V2»s) 

exp(v 3 j) 

exp(v4 J y) 

exp(Y55) 

exp(V6 友） 
exp(y^s) 


(x -\-s, t, u) 

(x, t -\-s,u) 

(x 9 1 9 e s u) 

(e s x,e 2s t,u) 

(x-\-2ts 9 t f ue~ sx - s2t 


\1 - 4st 9 l-4st 
(x,t, u + sfi(x,t)) 


， m \/1 — 4st exp 


-sx^ 

1 — 4st 


Table 29.2 The transformations caused by the symmetry group of the heat equation. 


The symmetry groups G\ and G 2 reflect the invariance of the heat equation 
under space and time translations. G 3 and G 爲 demonstrate the linearity of the heat 
equation: We can multiply solutions by constants and add solutions to get new 
solutions. The scaling symmetry is contained in G 4 , which shows that if you scale 
time by the square of the scaling of x 9 you obtain a new solution. G 5 is a Galilean 
boost to a moving frame. Finally, Gg is a transformation that cannot be obtained 
from any physical principle. Since each group G ； is a one-parameter group of 
symmetries, if/is a solution of the heat equation，so are the functions fi = G* * / 
for allThese functions can be obtained from Equation (29.8). As an illustration, 
we find /< 5 . First note that for m = f(x, t), we have 


1-4打’ 


l-4st' 


u = f(x, t) = f(x ， t)y/l — Ast exp 


—sx 4 


Next solve the first two equations above for a: and f in terms of x and f: 

x t 

~ l+4sr 一 l+4sf 
Finally, substitute in the last equation to get 


取 ^ = / IT%? 

or, changing Jc to jc and t to t y 


)/i 5 ? exp 


l+4d 


f6(x, t) 


—SX' 


vT+4sF 


1+4 对 」 J Vl +4 打， 1+4 打 













29.4 APPLICATION TO SOME KNOWN PDES 961 


The other transformed functions can be obtained similarly. We simply list these 
functions: 


fl(x, t) = f— S ， t 、， ( 又 ， 0 = f ^ ~ 

h{x,t) — e $ f(x,t), U{x,t) = f(e— s x ， e~ 2s t) f (29.28) 

f 5 (x,t) = e^ sx ^ s2t f(x - 2st, Oi Mx ， f) = f(x, t)+s^(x,t), 

k{x,t) = VTT4^ exp .1^. f (lT ^ 1 1 + 4^)' 

We can find the fundamental solution to the heat equation very simply as 
follows. Let / (jc, r) be the trivial constant solution c. Then 


u = Mx,t)= 


c fi -sx 2 /a^-4st) 
+ 4st 


is also a solution. Now choose c — yfsjn and translate t tot - l/4s (an allowed 
operation due to the invariance of the heat equation under time translation G 2 ). 
The result is 


which is the fundamental solution of the heat equation [see (22.45)]. 


29.4.2 The Wave Equation 

As the next example of the application of Lie groups to differential equations, we 
consider the wave equation in two dimensions. This equation is written as 

Utt — ^xx — Uyy = 0, or if J Uij = 0 and A = rf J Uij ， (29.29) 

where r] = diag(l, 一1 ， 一 1)，and subscripts indicate derivatives with respect to 
coordinate functions jc 1 = t, x 2 = x, and — y. With p = 3 and ^ = 1, a 
typical generator of symmetry will be of the form 


v 




(0 


dx ] 


+ 化 


(29.30) 


• n 

where {X^} 1=1 and t/ are functions of and m to be determined. The second 
prolongation of such a vector field is 

i=i 


29. LIE GROUPS AND DIFFERENTIAL EQUATIONS 


where by Theorem 29.3.5, 

u (i) X{k)u ^] + E X(k)u 认 = DiU-J2 (DiX ㈨) Uk, 

\ k=l / k=\ k=l 

= DiDj (u -J2 x(k)u f) + 诉 

\ k=zl } k=l 

3 

=DiDjU^Y, mDjX^)u k +u ik (DjX ⑻ ) +u jk (AX ⑻ )] ， 

k=l 

and we have used Equation (29.21). Using (29.21) further, the reader may show 
that 

U^ = Uij+u k (S jk U iu + S ik U ju - xjf) 

+ _ {Sii8jkU uu - X^Sji - xf^Sa) 

+ UH ㈣ 碑 - xfh n - xfhu) (29.31) 

一 u k u lm (X^SuSpn + X^SaSjk + X^SikSji) - UiUjUkX^, 

where a sum over repeated indices is understood. 

Applying pr (2) v to △，we obtain the infinitesimal criterion 

V {tt) = V { XX ) + yiyy) or ^ijudj) = 0 

Multiplying Equation (29,31) by and setting the result equal to zero yields 

0 = rj iJ U (iJ) = rj^Uij+Ujt (2rj ik U iu - ^X^f) + um (v kl U ⑽- 2X^n il ) 
— lukiX^rf 1 - lukUimX^^ 1 - UiUjUkX 恕 T} ij , (29.32) 

where we have used the wave equation, rj kl Uki = 0. Equation (29.32) must hold 
for all derivatives of u and powers thereof (treated as independent) modulo the 
wave equation. Therefore, the coefficients of such “monomials” must vanish. For 
example, since all the terms involving UkUi m are independent (even after substi- 
tutuig u xx + u yy for u tt ), we have to conclude that = 0 for all m, i.e., that 
■X ■⑴ are independent of m. Setting the coefficient of UkUi equal to zero and noting 
that = dX^/dx 1 = 0 yields 

= 0 U(x 9 y, t, u) — a(x 9 y 9 t)u + y, t). (29.33) 

Let us concentrate on the functions X^\ These are related via the term linear 
in Uki ，After inserting the wave equation in this term, we get 

u k iXf\ il = Ml2 (Xf } - X ( ^) + w 13 (X^ - X^) - u 2 3 (Xf 4 - Xf) 

+ w 22 (X[ X) - X^) + w 3 3 (X^ - Jtf) • 















29.4 APPLICATION TO SOME KNOWN PDES 963 


The Uij in this equation are all independent; so, we can set their coefficients equal 
to zero: 


y(2) _ v(l) Y ⑶ 一 y(l) y ⑶ Y ⑵— Q y(l) — Y(2) — Y ⑶ 

“ ^ ~ (29.34) 

The reader may verify that these relations imply that xj^ = 0 for any i, 7 , and 
l. For example. 


塭 = 


^122 = ^22 = - ^323 


( 2 ) 


X 


( 1 ) 

133 


^223 




a 




y ⑵ 一 y(l) — V ⑵ 

A 211 _ — A 221 一 — a 222. 


X 


⑶ 

113 


.y(3) 

:A 311 


⑵ 

121 


4)2 


(29.35) 


So, the first link of this chain is equal to its negative. Therefore, all the third deriva¬ 
tives in the chain of Equation (29.35) vanish. It follows that all X(*) ’s are mixed 
polynomials of at most degree two. Writing the most general such polynomials for 
the three functions X ⑴， X ⑵， and X ⑶ and having them satisfy Equation (29.34) 
yields 


=ai+ a^x 4 - ajy + a$t 4- a%{x 2 -\-y 2 + 1 2 ) + 2a^xt + 2a\oyt, 

X ⑵ = 旬 + - a^y-\- a^t + ag(x 2 - y 2 1 2 ) + 2aioxy + la^xt, 

X ⑶ =fl 3 + + a^y H- a^t 屮 aio(_x 2 + J 2 + 1 2 ) 4- 2a^xy + la^yt. 

(29.36) 

Setting the coefficient of Mfe and Uij equal to zero and using Equation (29.33) 

gives 


+ - X\f , 2 ay = + - , 

2a t = xlP - - X^ y \ Ptt - Pxx-Pyy=0. 

It follows that p is any solution of the wave equation, and 
(x(x, y 9 1) — an — a%t — a^x — aioj. 

By inserting the expressions found for and U in (29.30) and writing the 
result in the form cm we discover that the generators of the symmetry group 
consist of the ten vector fields given in Table 29.3 as well as the vector fields 

tidui = ^(x, y, t^)du 

for P an arbitrary solution of the wave equation. The ten vector fields of Table 
29.3 comprise the generators of the conformal group in three dimensions whose 
generalization to m dimensions was studied in Section 28.4. 


964 29. LIE GROUPS AND DIFFERENTIAL EQUATIONS 


Infinitesimal generator 

Transformation 

vi = 

= 9 f, V 2 = d x , V 3 = dy 

Translation 

V 4 = 

- xd t H- td x 


ve = —yd x + 

Rotation/Boost 

V 7 = 

= yd t + td y 


V 5 = 

—tdt -|- ydy 

Dilatation 

v 8 = 

= (t 2 + jc 2 + y 2 )S t + 2xtd x + 2ytd y — tud u 


V 9 = 

= 2xtd t + (t 2 -j-x 2 — y 2 )S x + 2xyd y — xud u 

Inversions 

vio 

— 2yt3 t + 2xyd x + (t 1 -x 2 -\- y 2 )d y — yud u 



Table 29.3 The generators of the conformal group for R 3 ， part of the symmetry group of 
the wave equation in two dimensions. 


29.5 Application to ODEs 

The theory of Lie groups finds one of its most rewarding applications in the in¬ 
tegration of ODEs. Lie’s fundamental observation was that if one could come up 
with a sufficiently large group of symmetries of a system of ODEs，then one could 
integrate the system. In this section we outline the general technique of solving 
ODEs once we know their symmetries. The following proposition will be useful 
(see [Warn 83, p. 40]): 

29.5.1. Proposition. Let M be an n-dimensional manifold and v e XiM). Assume 
that v| 户 7 ^ 0 for some P € M. Then there exists a local chart, i.e” local set of 
coordinate functions, (u; 1 ,..., w n ) at P such that v = a/aw 1 . 


29.5.1 First-Order ODEs 

The most general first-order ODE can be written as 


— = F(x,u) A(^, u, u x ) = u x — F(x,u) =0. (29.37) 






29.5 APPLICATION TO ODES 965 


A typical infinitesimal generator of the symmetry group of this equation is 4 v 
Xd x 4* Ud u , whose prolongation is 


pr^v+t / ⑷ 


如 X ， 


where U (x) = U X + (JJ U - X x )u x - X u u\, 

(29.38) 


(29.39) 


as the reader may verify. The infinitesimal criterion for the one-parameter group of 
transformations G to be a symmetry group of Equation (29,37) is pr (1) v(A) = 0, 
or 

dU /dU BX\ ^ BX , SF , tt BF 

V du dx / 3m dx du 

Any solution (X, U) of this equation generates a 1-parameter group of transfor¬ 
mations. The problem is that a systematic procedure for solving (29.39) is more 
difficult than solving the original equation. However, in most cases, one can guess 
a symmetry transformation (based on physical, or other, grounds), and that makes 
Lie’s method worthwhile. 

Suppose we have found a symmetry group G with infinitesimal generator v 
that does not vanish atPeMcXxU, Based on Proposition 29.5.1, we can 
introduce new coordinates 


w — ?(jc, u), y = rj(x, u) 


(29.40) 


in a neighborhood of P such that v = d/dw, whose prolongation is also d/dw 
[see (29.38)]. This transforms the DE of (29.37) into 5 A(y 9 w, w y ) = 0, and the 
infinitesimal criterion into 


pr ⑴ v(A) 


3A 


0 . 


It follows that A is independent of u;. The transformed DE is therefore A(y, w y ) 
0, whose normal form, obtained by implicitly solving for dw/dy, is 


dw 

dy 


H(y) w 


H(t)dt — w(a) 


for some function H of y alone and some convenient point y = a. Substituting this 
expression of w; as a function of j in Equation (29.40) and eliminating y between 
the two equations yields m as a function of x. 

Thus our task is to find the change of variables (2940). For this, we use \(w )= 
1 and \(y) = 0, and express them in terms of x and u: 


\(w) = v(§) = X^- + U^- = 1, 

ax ou 

dn dr ] 。 

吻) ==0 - 


(29.41) 


4 The reader is warned against the unfortunate coincidence of notation: X and C/ represent both the components of the 
infinitesimal generator and the spaces of independent and dependent variables! 

5 Here we are choosing w to be the dependent variable. This choice is a freedom that is always available to us. 


29. LIE GROUPS AND DIFFERENTIAL EQUATIONS 


The second equation says that " is an invariant of the group generated by v. We 
therefore use the associated characteristic ODE [see (29.3) and (29.5)] to find y 
(or rj): 


X(x, u) U(x, u) 


(29.42) 


To find w (or §), we introduce x(x,u,v) = v— 与 (x, u) and note that an equivalent 
relation containing the same information as the first equation in (29.41) is 

dx ou av 
which has the characteristic ODE 


X(x t u) U(x 9 u) 


(29.43) 


for which we seek a solution of the form i; — w) = c to read off 奈 (x ， m). 

The reader may wonder whether it is sane to go through so much trouble only 
to replace the original single ODE with two ODEs such as (29.42) and (29.43)! 
The answer is that in practice, the latter two DEs are much easier to solve than the 
original ODE. 

29.5.2. Example. The homogeneous FODE dujdx = F{u/x) is invariant under the 
scaling transformation (x, u) {sx, su) whose infinitesimal generator is v = xd x + w3«. 
The first prolongation of this vector is the same as the vector itself (reader, verify!). 

To find the new coordinates w andy, first use Equation (29.42) with X(x, u) =x and 
U(x, u) = u: 


• dx du u 

- = - - 一 

X u X 


^ y 


(see Box 29.1.8). 


Next, we note that (29.43) yields 


dx du 


dv =>• In m = u + In C2 v = ln(w/c2). 


Substituting from the previous equation, we obtain 

v = ln(ci^/c 2 ) = Inx + ln(ci/c 2 ) u — hix = c w = ln;c. 


The chain rule gives da/dx = (1 + ywy)lwy, so that the DE becomes 

1 -\-yw y 〜、 dw 1 

- = 尸 00 -p = -=n~r - ， 

ay F(y) - y 

which can be integrated to give w = H(y) orlrnc == H(y) = H(u/x) t which defines u as 
an implicit function of x. Q 






29.5 APPLICATION TO ODES 967 


29.5.2 Higher-Order ODEs 

The same argument used in the first order ODEs can be used for higher-order 
ODEs to reduce their orders. 

29.5.3. Proposition* Let 

r A d k u 

A(X, U W ) = A(X,U, Ml, . . =0 ， Wfc e 


bean nth order ODE. If this ODE has a one-parameter symmetry group, then there 
exist variables w = ^(x, u) and y = r}(x, u) such that 


A(x,w (n) ) = 



dw d n w\ 



i.e.，in terms ofw and y’the ODE becomes of(n — l)st order in w y . 


Proof, The proof is exactly the same as in the fiist-order case. The only difference 
is that one has to consider pr( n )v，where v = d/dw ‘ But Problem 29.7 shows that 
pr( n )v = v，as in the first-order case. □ 

29.5.4. Example. Consider a second-order DE A(u f u Xl u xx ) - 0, which does not de¬ 
pend on x explicitly. The fact that 3 A jdx = 0 suggests w x. So, we switch the dependent 
and independent variables and write w =x, and y u. Then, using the chain rule, we get 

du 1 d^U Wyy 

dx w y ’ dx 1 


Substituting in the original DE, we obtain 


A(y, Wy, Wyy) = A 



^21 

Wy 



which is of first order in w y . ■ 

29.5.5. Example. The order of the SOLDE u xx + p(x)u x + q(x)u = 0 can be reduced 
by noting that the DE is invariant under the scaling transformation (x, u) (x, su) 9 whose 
infinitesimal generator is v = m 3 m . With this vector field, Equations (29.42) and (29.43) 
give 


dx du dx du 



For the first equation to make sense, we have to have 
dx = 0 ^ x=c\ ^ y = x (by Box 29.1.8). 


The second equation in u gives 

•y = In m + c 今 v — In u = c wj — In w u = e w . 



29. LIE GROUPS AND DIFFERENTIAL EQUATIONS 



29.5.3 DEs with Multiparameter Symmetries 

We have seen that 1-parameter symmetries reduce the order of an ODE by 1. 
It is natural to suspect that an r-parameter symmetry will reduce the order by r. 
Although this suspicion is correct, it turns out that in general, one cannot reconstruct 
the solution of the original equation from those of the reduced (n — r)th-order 
equation. (See [Olve 86, pp. 148-158] for a thorough discussion of this problem.) 
However, the special, but important, case of second-order DEs is an exception. The 
deep reason behind this is the exceptional structure of 2-dimensional Lie algebras 
given in Box 27.2.5. We cannot afford to go into details of the reasoning, but simply 
quote the following important theorem. 

29.5.6. Theorem. Let A(x, “⑷） = 0 be an nth-order ODE invariant under a 
2-parameter group. Then there is an (n — 2)nd-order ODE A(;y ， w^ n ~ 2 ^) = 0 with 
the property that the general solution to A can be found by integrating the general 
solution to A. In particular，a second-order ODE having a 2-parameter group of 
symmetries can be solved by integration. 

Let us analyze the case of a second-order ODE in some detail. By Box 27.2.5, 
the infinitesimal generators vi and V 2 satisfy the Lie bracket relation 

[vi, V 2 ] = cvi, c = 0 or 1. 

We shall treat the abelian case (c = 0) and leave the nonabelian case for the 
reader.To begin with, we use j and t for the transformed variables, and at the end 
replace them with y and w. 

By Proposition 29.5.1, we can let vi = 3/ds. Then V 2 can be expressed as the 
linear combination 

9 9 

v 2 =orCM)5 + 卢 (M)x. 

The commutation relation [vi,Y 2 ] = 0 gives 

0 = [9^, ocd s H- pdt] = 4 - 

os as 





29.5 APPLICATION TO ODES 


showing that ot and p are independent of s. We want to simplify V 2 as much 
possible without changing Vi. A transformation that accomplishes this is S 
s + h{t) and T = T(t). Then, by Equation (26.8) we obtain 


a a 3S 3 3T 3 3 

Vl( ， S) 35 +Vl(r) ar = ~Ss~dS^~dIST = BS' 

a B / ds t 0 dS\ a / BT BT\ d 

V2(5) ^5 + V2(r) 9T ^ ra7 + ~)h + 卢 ^7 )开 


(a H- ph f ) ^ • 


If 0 # 0, we choose T f = l/p mdh f = —o?/ 卢 to obtain 


(29.44) 


where we have substituted s for S and t for T. If P = 0, we choose a 二 T, and 
change the notation from S to s and T tot to obtain 


V2= ^' 


(29.45) 


The next step is to decide which coordinate is the independent variable, prolong 
the vector fields, and apply it to the DE to find the infinitesimal criterion. For 0 一 0, 
the choice is immaterial. So, \ttw = s and y t. Then the prolongation of Vi 
and V 2 will be the same as the vectors themselves, and with A(y, w, w y , w yy )= 
Wyy - F(y, w, Wy), the infinitesimal criteria for invariance will be 


0 = pr ⑵ vi(A) = vi(A)= 
0 = pr (2) V 2 (A) = V 2 (A)= 


SF^ 

Jw' 

dF 

a7* 


It follows that in the (y 9 w) system, F will be a function of w y alone and the DE 
will be of the form 


、 dw v 、 

Wyy — F(Wy) = F (Wy) 


Wy dz 

^0) 

- , - ) 

=H(Wy) 


The last equation can be solved for w y in terms of y and the result integrated. 

For 戶 = 0, choose w = t and y = s. Then Vi will not prolongate, and as the 
reader may verify, 


pr ⑵ V2 = V2 - - 3w y w 




3XVyiVyy 


dWyy' 


970 29. LIE GROUPS AND DIFFERENTIAL EQUATIONS 


and the infinitesimal criteria for invariance will be 

3A 8F 

Ty 

51 17 

0 = pr( 2 )V 2 (A) = —w 


0 = pr( 2 )vi(A) 




vi(A ) 二 
BF 




^3^ 

=0 


2 BF 。 

+ - U)yy 


y 


BWy 


F 


It follows that in the (y, w) system, F will be a function of w mdw y and satisfy 
theDE 


3F 


Wy 



= 3F, 


whose solutionis of the form F(w, w y ) = WyF{w). The original DE now becomes 

^yy = ^yF(w), 

for which we use the chain rule w yy = w y dw y /dw to obtain 






Wy 


/ 


w 


F(z)dz ^ 


dw 

dy 


H(wY 


iH(w) 


which can be integrated. Had we chosen w — s and y = t, F would have been 
a function of y and the DE would have reduced to w yy = F(y), which could be 
solved by two consecutive integrations. The nonabelian 2-dimensional Lie algebra 
can be analyzed similarly. The reader may verify that if 卢 = 0, the vector fields 
can be chosen to be 




(29.46) 


leading to the ODE w yy = w y F(y) 9 and if p ^ 0, the vector fields can be chosen 
to be 


3 d d 

Vl== ^ = ■ ( 29 * 47 ) 

leading to the ODE w yy = F(w y )/y. Both of these ODEs are integrable as in the 
abelian case. 


29.6 Problems 

29X Suppose that are invariants of the PDE (29.3). Show that any function 

/(Fj, F 2 ,..., F n ) is also an invariant of the PDE. 


29.6 PROBLEMS 971 


29.2. Find the function f = 0 • f when f(x) = ax + b and 0 is the angle of 
rotation of 50(2). 

29.3. Use the result of Problem 29.2 to find mi. Hint: Note that a = u\, 

29.4. Transform the DE of Example 29.3.2 from Cartesian to polar coordinates to 
obtain dr/dG — r 3 . 

29.5. Using the definition of total derivative, verify Equation (29.21). 

29.6. Show that 5(9(2) is a symmetry group of the first-order DE 


A(x, M, Ml) = (M — x)u\ + JC + M = 0 
and write the same DE in polar coordinates. ‘ 

29.7. Show that the nth. prolongation of the generator of the iih translation. Si, is 
the same as the original vector. 

29.8. Find the first prolongation of the generator of scaling: xd x +ud u . 

29.9. Show that when the group acts only on the single dependent variable u f the 
prolongation of v =： Ud u is given by 


pr ⑴ v 


v + 乞力 


du 


where Uj = 


SU dU 


29.10. Show that the nth prolongation of v = X(x,u)d x + U(x, u)d u for an 
ordinary DE of nth order is 


pr ⑻ v = v 


+ J2 U [k] 




where 

M ⑻ e 含 and U [k] = D k x (U - Xu x ) + Xu iM \ * 
dx k 

29.11* Compute the second prolongation of the infinitesimal generators of the 
symmetry group of the heat equation. 

29.12. Derive Equations (29.31) and (29.32). 

29.13. Using Equation (29.34) show that = 0 for any i, j,k,mdL 

Korteweg-de Vries 29.14. The Korteweg-de Vries equation is u t + u xxx -h uu x = 0. Using the 
equation technique employed in computing the symmetries of the heat and wave equations, 
show that the infinitesimal generators of symmetries of the Korteweg-de Vries 
equation are 


Vi = 0 X > V2 = S t , 

V 3 = td x H- d u , 

V 4 = xd x + 3tdt — 2ud u . 


translation 
Galilean boost 
scaling 



972 29. LIE GROUPS AND DIFFERENTIAL EQUATIONS 


29.15 - Suppose M(x, u) dx+N(x, u) du =0 has a 1-parameter symmetry group 
with generator v = Xd x + Ud u . Show that the function u) = \/{XM-\-UN) 

is an integrating factor. 

29,16* Show that the second prolongation of v = wd y (with y treated as indepen¬ 
dent variable) is 


pr ⑵ v = v — ^ 


3u ； 3, ^ yy dWyy 


29.17. Go through the case of p = 0 in the solution of the second order ODE 
and, choosing w = s and y = t, show that F will be a function of y alone and the 
original DE will reduce to w yy = F(y). 

29.18. Show that in the case of the nonabelian 2-dimensional Lie algebra ， 

(a) the vector fields can be chosen to be 



d 

y2 = s Ys 


if ^5 = 0. 

(b) Show that these vector fields lead to the ODE w yy — w y F(y). 

(c) If 卢卢 0, show that the vector fields can be chosen to be 



d d 

V2 = w 


(d) Finally, show that the latter vector fields lead to the ODE w yy = F (w y )/y. 


Additional Reading 

1. Bluman, G. and Kumei, S. Symmetries and Differential Equations ， Springer- 
Verlag, 1989. A readable book on the subject of this chapter with many 
worked-out examples. 

2. Olver, R Application of Lie Groups to Differential Equations, Springer- 
Verlag, 1986. Our treatment of the subject follows closely that of Olver. 
This self-contained book, although formal, is very lucid in style with many 
historical notes. All the concepts introduced are clearly stated and many 
examples introduced to clarify them. 

3. Stephani,H. Differential Equations: Their solutions using symmetries ， Cam¬ 
bridge University Press, 1989. Written by a physicist, this little book treats 
solutions of ordinary differential equations in great detail with very little 
formalism. 





30 ____ 

Calculus of Variations, Symmetries, and 
Conservation Laws 


In this chapter we shall start with one of the oldest and most useful branches of 
mathematical physics, the calculus of variations. After giving the fundamentals 
and some examples, we shall investigate the consequences of symmetries asso¬ 
ciated with variational problems. The chapter then ends with Noether’s theorem, 
which connects such symmetries with their associated conservation laws. All vec¬ 
tor spaces of relevance in this chapter will be assumed to be real. 


30.1 The Calculus of Variations 

One of the main themes of calculus is the extremal problem: Given a function 
/ : R D D — M, find the points in the domain D of / at which / attains a 
maximum or minimum. To locate such points, we find the zeros of the derivative 
of /• For multivariable functions, / : D Q ^ E, the notion of gradient 

generalizes that of the derivative. To find the jth component of the gradient V/, 
we calculate the difference A / between the value of / at (x 1 ,.. • ， a:* 7 + s ， .. •, x p ) 
and its value divide this difference by e, and take the limit 

e — 0. This is simply partial differentiation, and the jth component of the gradient 
is just the jth partial derivative of /. 

30.1.1 Derivative for Hilbert Spaces 

To make contact with the subject of this chapter, let us reinterpret the notion 
of differentiation. The most useful interpretation is geometric. In fact, our first 
encounter with the derivative is geometrical: We are introduced to the concept 
through lines tangent to curves. In this language, the derivative of a function / : 
R 〕 Q — E at xo is a line (or function) \// : Q Z) Qq R passing through 



974 30. CALCULUS OF VARIATIONS, SYMMETRIES, AND CO MS ER VAT I ON LAWS 


differentiability of a 
function on a Hilbert 
space at a point 


(jco, f(xo)) whose slope is defined to be the derivative of / at (see Figure 30.1): 
== /(^o) + f(xo)(x - xq). 


The function \jr{x) describes a line, but it is not a linear function (in the 
vector-space sense of the word). The requirement of linearity is due to our desire 
for generalization of differentiation to Hilbert spaces, on which linear maps are the 
most natural objects. Therefore, we consider the line parallel to f(x) that passes 
through the origin. Call this Then 


♦ M = f(x 0 )x, (30.1) 

which is indeed a linear function. We identify <j>{x) as the derivative of / at 
This identification may appear strange at first but, as we shall see shortly, is the 
most convenient and useful. Of course, any identification requires a one-toone 
correspondence between objects identified. It is clear that indeed there is a one- 
to-one correspondence between derivatives at points and linear functions with 
appropriate slopes. Equation (30.1) can be used to geometrize the definition of 
derivative. First consider 

尸⑽二 and /⑽ =lim 一 胸 * 

X — Xo x—JCO X — XQ 


Next note that, contrary to / which is usually defined only fora subset of the real 
line, <p is defined for all real numbers M, and that <p(x — jco) = — <I>(xq) due 

to the linearity of 0. Thus, we have 

v fix) - f(xo) (p(x)-^)(xo) v (j)ix — xo) 

lim - = - = lim - , 

x —Xo X — Xo x^XQ X — Xo 

or 


lim 

X-^Xo 


1/00 - f(xp) - (j)(x - Xq)\ 
\x-x 0 \ 



(30.2) 


where we have introduced absolute values in anticipation of its analogue — norm. 
Equation (30.2) is readily generalized to any complete normed vector space (Ba¬ 
nach space), and in particular to any Hilbert space: 

30.1.1. Definition. LetJii and !K 2 be Hilbert spaces with norms || • ||i and || - \\% } 
respectively. Let f : D Q ^ 0i2 be any map and |xo> € Q. Suppose there is 

a linear mapT e ， %) with the property that 


lim 


11^ -^o 111 


0 for \x) G Q. 


Then, we say that f is differentiable at [xo), and we define the derivative of f at 
|^o) to be Df (xo) = T. If f is differentiable at each [x) € J2, the map 

Of £(^i, JC 2 ) given by Of(\x)) = Of(x) 


is called the derivative of f. 



30.1 THE CALCULUS OF VARIATIONS 975 



Figure 30.1 The derivative at (jcq, / Uo)) as a linear function passing through the origin 
with a slope / The function / is assumed to be defined on a subset Q of the real line. 

restricts the ;c’s to be close to xq to prevent the function from misbehaving (blowing 
up), and to make sure that the limit in the definition of derivative makes sense. 


The reader may verify that if the derivative exists, it is unique. 

30.1,2* Example. Let = E n and J<2 = D Q Then for 

\x) e Q 9 D/(jc) is a linear map, which can be represented by a matrix in the standard bases 
of andM m .To find this matrix, we need to let Df(x) act on the jth standard basis of ， 
i.e.，we need to evaluate Of(x) \ej). This suggests taking |j> = \x) -\-h\ej) (with h ^ 0 ) 
as the vector appearing in the deihiition of derivative at \x). Then 


\\f(\y))-f(\x})-Df(x)(\y)-\x))\\ 2 

\\y-x\h 

ll/U 1 ， … ^ + /i ， … ， x n ) - /( 又 1 ， … ， W， …，- hOf(x) \ej)\\ 2 

= j^l 

approaches zero as /z 0, so that the zth component of the ratio also goes to zero. But 
thei'th component of D/(x) \ej) is simply a^., the ijth component of the matrix of Df(x). 
Therefore, 


xj + h t ..., x n ) - f(x l , … ， X 】， ..., x n ) -ha l f 

lim ----- [ = 0, 

“0 \h\ 


which means that a ； = df i /dx^. 


隨 


The result of the example above can be stated as follows: 








976 30. CALCULUS OF VARIATIONS, SYMMETRIES, AND CONSERVATION LAWS 


differential and 
gradientof fat |x) 


directional derivative 


30.1.3. Box. For f :R n D Q -> W n , the matrix ofDf(x) in the standard 
basis ofW 1 and ]R WI is the Jacobian matrix of /. 


The ease of IK 2 = M deserves special attention. Let Jfbe a Hilbert space. Then 
Df(x) e jC(M, R) = is denoted by d/(x) and renamed the differential of / 
at \x). Furthermore, through the inner product, one can identify 6f : Q ^ Di* 
with another map defined as follows: 

30.1.4. Definition. Let "Kbea Hilbert space and / : iK D Q E. The gradient 
^ f of f is the map V/ : ^ ^ defined by 

(V/(x)|a) = (df(x) 9 a) V \x) G \a) e 

where (,) is the pairing {, > : IK* x R of^K and its dual. 

Note that although / is not an element of d/(x) is, for all points \x) e ^ 
at which the differential is defined. 

30.1.5. Example. Consider the function / : iK —v M given by /(|^)) = ||.x|] 2 . Since 
\\y ~ x\\ 2 = \\y\\ 2 - \\x\\ 2 - 2(x\y - x) 

and since the derivative is unique, the reader may check that df(x) \a) = 2 (x|a), or 
^ f(\x}) = 2\x). _ 

Derivatives could be defined in terms of directions as well: 

30.1,6* Definition. LetOCi and Kibe Hilbert spaces. Let f D %2 be 
any map and \x) e ^2. We say that f has a derivative in the direction \a) € at 
\x)if 

+ 1 \a)) 

dt / =0 

exists. We call this element of%2 directional derivative of f in the direction 
\a) e %\ at |jc). 

The reader may verify that if / is differentiable at |^) (in the context of Defi¬ 
nition 30.1.1), then the directional derivative of / in any direction \a) exists at |jc) 
and is given by 

^/(|x)+/|a)) = D/(x) \a ). (30.3) 

dt f=0 













30.1 THE CALCULUS OF VARIATIONS 977 


functional derivative 
or variational 
derivative 


evaluation function 


30.1.2 Functional Derivative 


We now specialize to the Hilbert space of square-integrable functions «C 2 ( 卩 ） for 
some open subset Q of some M m . We need to change our notation somewhat. Let 
us agree to denote the elements of by / ， m ， etc. Real-valued functions on 
i 2 (^) will be denoted by L, H, etc. The m-tuples will be denoted by boldface 
lowercase letters. To summarize, 

x, y e R m , f,u e /, m : R m D Q ^ E, 

{f\u) = f f(x)u(x)d m x, L,H : L 2 (Q) -» M. 


Furthermore, the evaluation of L at m is denoted by L[m]. 

When dealing with the space of functions, the gradient of Definition 30.1.4 is 
called a functional derivative or variational derivative and denoted by SL/Su. 
So 




dt 


L[m + //] 



(30.4) 


where we have used Equation (30.3). Note that by definition ， 8L/8u is an element 
of the Hilbert space £ 2 ( 卩 ) ； so, the integral of (30.4) makes sense. Equation (30.4) 
is frequently used to compute functional derivatives. An immediate consequence 
of Equation (30.4) is the following important result. 

30.1.7. Proposition. Lett : «C 2 (Q) — RforsomeQ c R m ■ If has an extremum 
at u, then 


8u 

Proof. If L has an extremum at m, then the RHS of (30.4) vanishes for function 
/,in particular, for any orthonormal basis vector \et). Completeness of a basis now 
implies that the directional derivative must vanish (see Proposition 5.1.9). □ 


Just as in the case of partial derivatives, where some simple relations such as 
derivative of powers and products can be used to differentiate more complicated 
expressions, there are some primitive formulas involving functional derivatives 
that are useful in computing other more complicated expressions. Consider the 
evaluation function 


Ey : (2(。) — M given by E y [/] = /(y). 

Using Equation (30.4), we can easily compute the functional derivative of E y : 


L SJ ^( xmx)dmx= l Ey[u + tf] 


d 


/(y) ^ 


t=o 
占 Ey [ 鉍 ] 


{u(y) + tf(y)} 


dt 

(x) = 5(x - y). 


(30.5) 




978 30. CALCULUS OF VARIATIONS, SYMMETRIES, AMD CONSERVATION LAWS 


It is instructive to compare (30.5) with the similar formula in multivariable 
calculus, where real-valued functions / take a vector x and give a real number. 
The analogue of the evaluation function is Ei ，which takes a vector x and gives the 
real number x 1 , the /th component of x. Using the definition of partial derivative, 
one readily shows that dEi fBx 】=which is (somewhat less precisely) written 
as dx 1 /dxi = S”，. The same sort of imprecision is used to rewrite Equation (30.5) 
as 


Su(y) 8uy 
8u(x) 8u x 


= 5(x-y). 


(30.6) 


where we have turned the arguments into indices to make the analogy with the 
discrete case even stronger. 

Another useful formula concerns derivatives of square-integrable functions. 
Let E y j denote the evaluation of the derivative of functions with respect to the ith 
coordinate: 


E y ， t •: Z 2 (Q) R given by Ey〆/) = 

"Then 过 similar argument as above will show that 


#(x) = -aj(x-y), 


or 


SSju(y) 

Su(x) 


= —di8(x-y )， 


and in general, 
<5m(x) 


= _ y)* 


(30.7) 


Equation (30.7) holds only if the function /, the so-called test function, van¬ 
ishes on 9 fi，the boundary of the region of integration. If it does not, then there 
will be a “surface term” that will complicate matters considerably. Fortunately, in 
most applications this surface term is required to vanish. So, let us adhere to the 
convention that 


30.1.8. Box. All test functions /(x) appearing in the integral of Equation 
(30 A) are assumed to vanish at the boundary of Q, 


For applications, we need to generalize the concept of functions on Hilbert 
spaces. First, it is necessary to consider maps from a Hilbert space to For 
simplicity, we confine ourselves to the Hilbert space <C 2 (Q). Such a map H : 
C 2 (fi) - > D cW 1 , for some subset D ofR n ,can be written in components 


H = (Hi, H2, •■■ ， H n ), 


where H ； : <C 2 (Q) — E ， i = 1,.,, ,n. 















30.1 THE CALCULUS OF VARIATIONS 


Next, we consider an ordinary multivariable function L :R n D D ^ and use 
it to construct a new function on the composite of L and H: 

L o H : £ 2 (Q) —R, L o H[m] = L (Hi[m], _ •. ， HJm]) • 

Then the functional derivative of L o H can be obtained using the chain rule and 
noting that the derivative of L is the common partial derivative. It follows that 


SL o H[w] 
Su 


|—I (Hx[m], . . ., H„[m])| (x) = 


(30.8) 


where diL is the partial derivative of L with respect to its ith argument. 

30.1.9. Example. Let L : (a, x M x R R, be a function of three variables the first 
one of which is defined for the real interval (a, b). Let H ； : L 2 (a, b) R,i = 1 ， 2, 3, be 
defined by 

H\[u] = x t H2[m] = ^xlu] = u(x), H3[m] = E r x [u] = u^x), 

where is the evaluation function and E; evaluates the derivative. It follows that LoH[«]= 
L(x t u(x), u r (x)). Then, noting that Hi[m] is independent of u, we have 

—^ — (y) = (y) + hL 8u (y) + 8u (y) 

= 0 + 32LS(y-x)- d 3 L8 f (y-x) = d 2 L8(x - y) + d 3 L8 f (x-y). 


This is normally written as 
8L(x,u(x) i u / (x)) ^ ^ 


ou au du 


(30.9) 


which is the unintegrated version of the classical Euler — Lagrange equation for a single 
particle, to which we shall return shortly. 埋 

A generalization of the example above turns L into a function on Q x E x 
with 12 C so that 

L (x l , m(x), 9im(x), ... , 8/«w(x)) e R, with x g R m , 

The functions are defined as 


H|[w] = x l for / = 1, 2, … ，所， 

H|[m] = E x [m] = m(x) for i = m + 1 ， 

Hi [u] = E X|i *[w] = 8 /m(x) for r_ == : m + 2, . • _ ， 2m + 1 ， 

and lead to the equation 


8L o H[m] 
Su 


2m+l 


(y) = — y) + 8iLdi8(x - y )， 

i=m+2 


(30.10) 


which is the unintegrated version of the classical Euler-Lagrange equation for a 
field in m dimensions. 



980 30. CALCULUS OF VARIATIONS, SYMMETRIES, AND CONSERVATION LAWS 


wth-order variational 
problem; Lagrangian; 
functional 


Euler operator 


30.1.3 Variational Problems 

The fundamental theme of the calculus of variations is to find functions that ex» 
tremize an integral and are fixed on the boundary of the integration region. A prime 
example is the determination of the equation of the curve of minimum length in the 
passing through two points (x\, y\) and fe, J2) - Such a curve, written 
as j = u(x), would minimize the integral 

int[«] = f y/l + [u f (x)] 2 dx, u(x\) = u(x 2 ) = J2- (30,11) 

Note that int takes a function and gives a real number, i.e. — if we restrict our 
functions to square-integrable ones — int belongs to (xi, ^2) - This is how contact 
is established between the calculus of variations and what we have studied so far 
in this chapter. 

To be as general as possible, we allow the integral to contain derivatives up 
to the nth order. Then, using the notation of the previous chapter, we consider 
functions L on M( n ) c S2 x t/( n )，where we have replaced X with so that 
M = RP Di2xU 


30*1.10. Definition. By an nth-order variational problem we mean finding the 
extremum of the real-valued function L : «C 2 (Q) E given by 

L[u]= f L(x,u in) )d p x, (30.12) 


where is a subset ofR p x R^P (n \ L is a real-valued function on and = 
(p 4 - n)\/(n\p\). In this context the function L is called the Lagrangian of the 
problem, and L is called a functional. 1 

The solution to the variational problem is given by Proposition 30.1.7, moving 
the functional derivative inside the integral, and a straightforward (but tedious!) 
generalization of Equation (30.10) to include derivatives of order higher than one. 
Due to the presence of the integral, the Dirac delta function and all its derivatives 
will be integrated out. Before stating the solution of the variational problem, let us 
introduce a convenient operator. 

30.1.11* Definition* For 1 <a<q t the ath Euler operator is 

E a = ^(-£>)y —, (30.13) 

j du J 

where for J = (ju … ， jk\ 

e = ( 一 Z^)(-Z^ 2 ) … (-D jk \ 

and the sum extends over all multi-indices J = (ju …， jk )， including / = 0. 


'Do not confuse this functional with the linear functional of Chapter 1. 







30.1 THE CALCULUS OF VARIATIONS 981 


Euler-Lagrange 

equations 


The negative signs are introduced because of the integration by parts involved 
in the evaluation of the derivatives of the delta function. Although the sum in 
Equation (30.13) extends over all multi-indices, only a finite number of terms in 
the sum will be nonzero, because any function on which the Euler operator acts 
depends on a finite number of derivatives. 

30.1,12. Theorem. If u is an extremal of the variational problem (30.12), then it 
must be a solution of the Euler-Lagrange equations 


— 0, a = l, … ，孕 . 

j du J 


Leonhard Euler (1707-1783) was Switzerland’s foremost 
scientist and one of the three greatest mathematicians of 
modem times (Gauss and Riemann being the other two). He 
was perhaps the most prolific author of all time in any field. 

From 1727 to 1783 his writings poured out in a seemingly 
endless flood, constantly adding knowledge to every known 
branch of pure and applied mathematics, and also to many 
that were not known until he created them. He averaged 
about 800 printed pages a year throughout his long life, 
and yet he almost always had something worthwhile to say. 

The publication of his complete works was started in 1911, 
and the end is not in sight. This edition was planned to include 887 titles in 72 volumes, 
but since that time extensive new deposits of previously unknown manuscripts have been 
unearthed, and it is now estimated that more than 100 large volumes will be required for 
completion of the project. Euler evidently wrote mathematics with the ease and fluency of a 
skilled speaker discoursing on subjects with which he is intimately familiar. His writings are 
models of relaxed clarity. He never condensed, and he reveled in the rich abundance of his 
ideas and the vast scope of his interests. The French physicist Arago, in speaking of Euler's 
incomparable mathematical facility, remarked that “He calculated without apparent effort, 
as men breathe, or as eagles sustain themselves in the wind.” He suffered total blindness 
during the last 17 years of his life, but with the aid of his powerful memory and fertile 
imagination, and with assistants to write his books and scientific papers from dictation, he 
actually increased his already prodigious output of work. 

Euler was a native of Basel and a student of Johann Bernoulli at the University, but 
he soon outstripped his teacher. He was also a man of broad culture, well versed in the 
classical languages and literatures (he knew the Aeneid by heart), many modem languages, 
physiology, medicine, botany, geography, and the entire body of physical science as it was 
known in his time. His personal life was as placid and uneventful as is possible for a man 
with 13 children. 

Though he was not himself a teacher, Euler has had a deeper influence on the teaching of 
mathematics than any other person. This came about chiefly through his three great treatises: 
Introductio in Analysin Infinitorum (1748); Institutiones Calculi Dijferentialis (1755); and 
Institutiones Calculi Integralis (1768-1794). There is considerable truth in the old saying 





982 30. CALCULUS OF VARIATIONS, SYMMETRIES, AND CONSERVATION LAWS 


that all elementary and advanced calculus textbooks since 1748 are essentially copies of 
Euler or copies of copies of Euler. These works summed up and codified the discoveries 
of his predecessors, and are full of Euler’s own ideas. He extended and perfected plane 
and solid analytic geometry, introduced the analytic approach to trigonometry, and was 
responsible for the modem treatment of the functions hix and e x . He created a consistent 
theory of logarithms of negative and imaginary numbers, and discovered that In x has an 
infinite number of values. It was through his work that the symbols e, tt, and i = y/—\ 
became common currency for all mathematicians, and it was he who linked them together in 
the astonishing relation e ilT — —1. Among his other contributions to standard mathematical 
notation were sinx ， co&x, the use of fix) for an unspecified function, and the use of 
for summation. 

His work in all departments of analysis strongly influenced the further development of 
this subject through the next two centuries. He contributed many important ideas to differ¬ 
ential equations, including substantial parts of the theory of second-order linear equations 
and the method of solution by power series. He gave the first systematic discussion of the 
calculus of variations, which he founded on his basic differential equation fora minimizing 
curve. He discovered the integral defining the gamma function and developed many of its 
applications and special properties. He also worked with Fourier series, encountered the 
Bessel functions in his study of the vibrations of a stretched circular membrane, and applied 
Laplace transforms to solve differential equations — all before Fourier, Bessel, and Laplace 
were bom. 

E. T. Bell, the well-known historian of mathematics, observed that “One of the most 
remarkable features of Euler’s universal genius was its equal strength in both of the main 
currents of mathematics, the continuous and the discrete.” In the realm of the discrete, he 
was one of the originators of number theory and made many far-reaching contributions to 
this subject throughout his life. In addition, the origins of topology~one of the dominant 
forces in modem mathematics — lie in his solution of the Konigsberg bridge problem and 
his formula V — E F = 2 connecting the numbers of vertices, edges, and faces of a 
simple polyhedron. 

The distinction between pure and applied mathematics did not exist in Euler’s day, and 
for him the entire physical universe was a convenient object whose diverse phenomena 
offered scope for his methods of analysis. The foundations of classical mechanics had 
been laid down by Newton, but Euler was the principal architect. In his treatise of 1736 he 
was the first to explicitly introduce the concept of a mass-point, or particle, and he was 
also the first to study the acceleration of a particle moving along any curve and to use the 
notion of a vector in connection with velocity and acceleration. His continued successes 
in mathematical physics were so numerous, and his influence was so pervasive, that most 
of his discoveries are not credited to him at all and are taken for granted in the physics 
community as part of the natural order of things. However, we do have Euler’s angles for 
the rotation of a rigid body, and the all-important Euler-Lagrange equation of variational 
dynamics. 

Euler was the Shakespeare of mathematics — universal, richly detailed，and inex¬ 
haustible. 





30.1 THE CALCULUS OF VARIATIONS 


For the special case of p = 沒 = 1， the Euler operator becomes 

e = f + T(-D x y A = + D 2J. - ， 

du duj du du x du xx 

where D x is the total derivative with respect to x, and Uj is the jth derivative of 
u with respect to jc; and the Euler—Lagrange equation for the variational problem 


L[m] 


L(x, u^)dx 


becomes 


(30.14) 


The first variation is 
not sufficient fora 
full knowledge of the 
nature of the 
extremum! 


Since L carries derivatives up to the n-th order and each D x carries one derivative, 
we conclude that Equation (30.14) is a 2n-th order ODE. 

30.1.13» Example. The variational problem of Equation (30.11) has a Lagrangiaxx 
L(u, U^) = L(u, U^) = yjl + Ux, 

which is a function of the first derivative only. So, the Euler-Lagrange equation takes the 
form 


一 Ac 


㈤ 


7m 广^ i ^， 


or u X x = 0, so that u = f(x) = c\x + C 2 * The solution to the variational problem is a 
straight line passing through the two points y\) and (巧， yi) - _ 

The variational problem is a problem involving only the first functional deriva¬ 
tive, or the 力 variation. We know from calculus that the first derivative by itself 
cannot determine the nature of the extremum. To test whether the point in ques¬ 
tion is maximum or minimum, we need all the second derivatives (see Example 
4.7.4). One uses these derivatives to expand the functional in a Taylor series up to 
the second order. The sign of the second order contribution determines whether 
the functional is maximum or minimum at the extremal point. In analogy with 
Example 4.7.4, we expand L[u] about / up to the second-order derivative: 


L[u] = L[f] + / dPy (M(y) — /(y)) 

Jq dM(y) u=f 

+ 1 2i dPy L dPy, (M(y) — 


984 30. CALCULUS OF VARIATIONS, SYMMETRIES, AND CONSERVATION LAWS 


The integrals have replaced the sums of the discrete case of Taylor expansion of 
the multivariable functions. Since we are interested in comparing u with the / that 
extremizes the functional, the second term vanishes and we get 


1 r r s 2 L 


U=f 


• my) - /(y))(«(y) - /(〆))]. 


(30.15) 


Joseph Louis Lagrange (1736-1813) was bom Giuseppe 
Luigi Lagrangia but adopted the French version of his name. 

He was the eldest of eleven children* most of whom did not 
reach adulthood. His father destined him for the law — a pro¬ 
fession that one of his brothers later pursued~and Lagrange 
offered no objections. But having begun the study of physics 
and geometry, he quickly became aware of his talents and 
henceforth devoted himself to the exact sciences. Attracted 
first by geometry, at the age of seventeen he turned to analy¬ 
sis, then a rapidly developing field. 

In 1755, in a letter to the geometer Giulio da Fagnano, Lagrange speaks of one of Euler’s 
papers published at Lausanne and Geneva in 1744. The same letter shows that as early as 
the end of 1754 Lagrange had found interesting results in this area, which was to become 
the calculus of variations (a term coined by Euler in 1766). In the same year, Lagrange 
sent Euler a summary, written in Latin, of the purely analytical method that he used for this 
type of problem. Euler replied to Lagrange that he was very interested in the technique. 
Lagrange’s merit was likewise recognized in Turin; and he was named, by a royal decree, 
professor at the Royal Artillery School with an annual salary of 250 crowns — a sum never 
increased in all the years he remained in his native country. Many years later, in a letter to 
d'Alembert, Lagrange confirmed that this method of maxima and minima was the first fruit 
of his studies 一 " he was only nineteen when he devised it — and that he regarded it as his best 
work in mathematics. In 1756, in a letter to Euler that has been lost, Lagrange, applying 
the calculus of variations to mechanics, generalized Euler’s earlier work on the trajectory 
described by a material point subject to the influence of central forces to an arbitrary system 
of bodies, and derived from it a procedure for solving all the problems of dynamics. 

In 1757 some young Turin scientists, among them Lagrange, founded a scientific society 
that was the origin of the Royal Academy of Sciences ofT\irin. One of the main goals of this 
society was the publication of a miscellany in French and Latin, Miscellanea Taurinensia 
ou Melanges de Turin, to which Lagrange contributed fundamentally. These contributions 
included works on the calculus of variations, probability, vibrating strings, and the principle 
of least action. 

To enter a competition for a prize, in 1763 Lagrange sent to the Paris Academy of 
Sciences a memoir in which he provided a satisfactory explanation of the translational 
motion of the moon. In the meantime, the Marquis Caraccioli, ambassador from the kingdom 
of Naples to the court oflYirin, was transferred by his government to London. He took along 
the young Lagrange，who until then seems never to have left the immediate vicinity of Turin. 
Lagrange was warmly received in Paris, where he had been preceded by his memoir on lunar 







30.1 THE CALCULUS OF VARIATIONS 985 


A straight line 
segment is indeed 
the shortest distance 
between two points. 


libration. He may perhaps have been treated too well in the Paris scientific community, where 
austerity was not a leading virtue. Being of a delicate constitution, Lagrange fell ill and had 
to interrupt his trip. In the spring of 1765 Lagrange returned to Turin by way of Geneva. 

In the autumn of 1765 d’Alembert，who was on excellent terms with Frederick II 
of Prussia, and familiar with Lagrange’s work through Milanges de Turin ，suggested to 
Lagrange that he accept the vacant position in Berlin created by Euler’s departure for St. 
Petersburg. It seems quite likely that Lagrange would gladly have remained in Turin had 
the court of Turin been willing to improve his material and scientific situation. On 26 April, 
d J Alembert transmitted to Lagrange the very precise and advantageous propositions of the 
king of Prussia. Lagrange accepted the proposals of the Prussian king and, not without 
difficulties, obtained his leave through the intercession of Frederick II with the king of 
Sardinia. Eleven months after his arrival in Berlin, Lagrange married his cousin \^ttoria 
Conti who died in 1783 after a long illness. With the death of Frederick II in August 1786 
he also lost his strongest support in Berlin. Advised of the situation，the princes of Italy 
zealously competed in attracting him to their courts. In the meantime the French government 
decided to bring Lagrange to Paris through an advantageous offer. Of all the candidates, 
Paris was victorious. 

Lagrange left Berlin on 18 May 1787 to become pensionnaire veteran of the Paris 
Academy of Sciences, of which he had been a foreign associate member since 1772. Warmly 
welcomed in Paris, he experienced a certain lassitude and did not immediately resume his 
research. Yet he astonished those around him by his extensive knowledge of metaphysics ， 
history, religion, linguistics, medicine, and botany. 

In 1792 Lagrange married the daughter of his colleague at the Academy, the astronomer 
Pierre Charles Le Monnier. This was a troubled period, about a year after the flight of the 
king and his arrest at Varennes. Nevertheless, on 3 June the royal family signed the marriage 
contract “as a sign of its agreement to the union.” Lagrange had no children from this second 
marriage, which, like the first, was a happy one. 

When the academy was suppressed in 1793, many noted scientists, including Lavoisier, 
Laplace* and Coulomb were purged from its membership; but Lagrange remained as its 
chairman. For the next ten years, Lagrange survived the turmoil of the aftermath of the 
French Revolution, but by March of 1813, he became seriously ill. He died on the morning 
of 11 April 1813, and three days later his body was carried to the Panth6on. The funeral 
oration was given by Laplace in the name of the Senate. 


30.1.14* Example. Let us apply Equation (30.15)to the extremal function of Example 
30.1.13 to see if the line is truly the shortest distance between two points. The first functional 
derivative, obtained using Equation (30.9), is simply E (L): 


SL 


=E (L)== 


u yy 

(1 + uj)V^ 


To find the second variational derivative, we use the basic relations (30.6) ， (30.7), and the 
chain rule (30.10): 


S 2 L 

8u(y)Su(y) 


s 

「吻 1 

u=f 5m ( 〆 ） 

L(i + 4) 3 / 2 J 


986 30. CALCULUS OF VARIATIONS, SYMMETRIES, AND CONSERVATION LAWS 


connection between 
variational problem 
and the twin paradox 


- _ 呦一 3/2 為-吻 扣 命 5 〜為 I 


su(y f ) 


8u{y f ) J '»=/ 


S /f (y - y f ) 


S /f (y - y f ) 


(i+4) 3/2 U/ (i+4) 3/2 ’ 

because u yy = 0 and u y = c\ when u = f. Inserting this in Equation (30.15), we obtain 

1 


L[w] = L[/] - 
=L[/]- 


，巧 「 d y f X Ws"(y - y f )(u(y) - /W)W〆)-/(/)) 
2(1 + cf)V 2 Jxi Jxx 

—- — 1 2 f 2 dy(u(y) - f(y))-^-j(u(y) - f(y)). 

2(1 + cfy/ 2 Jxi 


The last integral can be integrated by parts, with the result 

(w(y) - -/(y))| - L dy 去 (“GO-/OO)] 




'xi 


2 


=0 because «(^, ) = f{x {)，i = 1,2 
Therefore, 


國 


LM = L[/]+ wiw f:: dy [ 去⑽ 卜，叫 2 • 

always positive 

It follows that L[/] < L[m], i.e.，that / indeed gives the shortest distance. 

30.1.15. Example. In the special theory of relativity, the element of the invariant “length,” 
or proper time, is given by yj dt^ — dx^. Thus, the total proper time between two events 
(?I，^l) and (? 2 > X 2 ) is given by 

亡 I - : dx 

dt 


L[jc]= 




yjl -X}dt, 


xt 


The extremum of this variational problem is exactly the same as in the previous example, 
the only difference being a sign. In fact, the reader may verify that 

5LW 


E(L) 


^ss 


Sx(s) — 、’ (1 - xj) 3 / 2 9 

and therefore, x = f{t) = c\t + extremizes the functional. The second variational 
derivative can be obtained as before. It is left for the reader to show that in the case at hand, 
L[/] > L[x], i.e” that f gives the longest proper time. Since the function f(t) — c\t-\-C 2 
corresponds to an inertial (unaccelerated) observer, we conclude that 


30.1.16. Box. Accelerated observers measure a shorter proper time between any 
two events than inertial observers. 


This is the content of the famous twin paradox, in which the twin who goes to a 
distant galaxy and comes back (therefore being accelerated) will return younger than her 
(unaccelerated) twin. M 



30.1 THE CALCULUS OF VARIATIONS 987 


30.1.4 Divergence and Null Lagrangians 

The variational problem integrates a Lagrangian over a region Q of If the 
Lagrangian happens to be the divergence of a function that vanishes at the boundary 
of Q，the variational problem becomes trivial, because all functions will extremize 
the functional. We now study such Lagrangians in more detail. 

30.1.17. Definition* Let {Fi : M ⑻ ^]f=i be functions on M ⑻， and F = 
total divergence (F\, … ， F p ). The total divergence ofF is defined to be 2, 

OF^DjFj, 

■/ =1 

where Dj is the total derivative with respect to x J . 

Now suppose that the Lagrangian L(x, u^) can be written as the divergence 
of some p-tuple F. Then by the divergence theorem, 

L[u]= f L(x,u in) )d p x= [ D.¥d p x= f F,da 

for any u = f(x) and any domain Q, It follows that L[/] depends on the behavior 
of / only at the boundary. Since in a typical problem no variation takes place 
at the boundary, all functions that satisfy the boundary conditions will be solu¬ 
tions of the variational problem, i_e” they satisfy the Euler—Lagrange equation. 
Lagrangians that satisfy the Euler — Lagrange equation for all u and x are called 
null Lagrangians null Lagrangians. It turns out that null Lagrangians are the only such solutions 
of the Euler-Lagrange equation (fora proof, see [Olve 86, pp. 252-253]). 

30.1.18. Theorem. A function L(x, u^) satisfies E (L) = Q for all x andu if and 

only ifL = D ■ F for some p-tuple of functions F = (F\ . F p ) ofx, u, and the 

derivatives ofu. 

In preparation for the investigation of symmetries of the variational problems, 
we look into the effect of a change of variables on the variational problem and 
the Euler operator. This is important, because the variational problem should be 
independent of the variables chosen. Let 

x = u), u = ^>(x, u) (30.16) 

be any change of variables. Then by prolongation, we also have m ⑻ = 
O ⑷ (a: ， m ⑻） for the derivatives. Substituting m = /(jc) and all its prolongations 
in terms of the new variables, the functional 

L[/] = [ L(x,pr^f(x))dPx 


2 The reader need not be concerned about lack of consistency in the location of indices (upper vs. lower), because we are 
dealing with indexed objects, such as Ft, which are not tensors! 


988 30. CALCULUS OF VARIATIONS, SYMMETRIES, AND CONSERVATION LAWS 


variational symmetry 
gro 叩 


will be transformed into 

L [/]= (L (无， pr ⑻ / ⑽朽， 

where the transformed domain, defined by 
{x = 少 ( X ， f(x))\x G Q], 

will depend not only on the original domain Q, but also on the function /. The 
new Lagrangian is then related to the old one by the change of variables formula 
for multiple integrals: 

L(x, pr ⑻ /(jc)) = L(x, pr ⑻ / ⑺） det J(x ， pr ⑴ /(jc ))， （ 30.17) 

where J is the Jacobian matrix of the change of variables induced by the function 

f. 

Starting with Equations (30.16) and (30.17)，one can obtain the transformation 
formula for the Euler operator stated below. The details can be found in [Olve 86, 
pp. 254-255]. 

30.1.19. Theorem. Let L(jc, m ⑻） and L(x, u^) be two Lagrangians related by 
the change of variable formulas (30.16) and (30.17). Then 


E a ( L ) = ^2 F 邰 ( x ， m (1) )£^( L ), a = 1，…，孕 

where E p is the Euler operator associated with the new variables, and 
/Dl^ 1 ... D p ^ 1 d^ l /3u a \ 

m ■ _ 

F a B e det : : : 

... D p ^p d^fP/Su 01 

\Di^ ... D p ^ 3^/du a J 


30.2 Symmetry Groups of Variational Problems 

In the theory of fields, as well as in mechanics, condensed matter theory, and statis¬ 
tical mechanics, the starting point is usually a Lagrangian. The variational problem 
of this Lagrangian gives the classical equations of motion, and its symmetries lead 
to the important conservation laws. 

30.2.1. Definition. A local group of transformations G acting on M C ^ 
is a variational symmetry group of the functional 

L[u]= [ L(x,u in) )d p x (30,18) 



30.2 SYMMETRY GROUPS OF VARIATIONAL PROBLEMS 


if whenever (the closure of) Q lies in Qq ，f is a function over Q whose graph is in 
M f and g e G is such that f ^ g • f is a single-valued function defined over 
then 


L(x,pr^f(x))d^x= / L(x,pr (n) f(x))d p x. 


(30.19) 


“Symmetry of the 
Lagrangian” is really 
the symmetry group 
of the variational 
problem! 


In the physics community, the symmetry group of the variational problem is 
(somewhat erroneously) called the symmetry of the Lagrangian. Note that if we 
had used L in the LHS of Equation (30.19), we would have obtained an identity 
valid for all lagrangians because of Equation (30.17) and the formula for the 
change in the volume element of integration. Only symmetric Lagrangians will 
satisfy Equation (30.19). 

As we have experienced so far, the action of a group can be very compli¬ 
cated and very nonlinear. On the other hand, the infinitesimal action simplifies 
the problem considerably. Fortunately, we have (see [Olve 86, pp. 257-258] for a 
proof). 

30.2.2. Theorem. A local group of transformations G acting onM C.Qq'kU is 
a variational symmetry group of the functional (30.18) if and only if 


pr ⑻ v(L) + LD*X = 0 

for all (x ， w( n )) e M( tt ) and every infinitesimal generator 

v = E E n u)^- 


(30.20) 


ofG，where X = (X 1 , ..., X p ). 

30.2*3, Example. Consider the case of 尸 =1 = 《， and assume that the Lagrangian is 
independent of x but depends on u e £ 2 (a, b) and its first derivative. Then the variational 
problem takes the form 

L[m] = / L(u ⑴) dx e I L(u f u x )dx. 

Ja Ja 

Since derivatives are independent of translations, we expect translations to be part of the 
symmetry group of this variational problem. Let us verify this. The infinitesimal generator 
of translation is 3^：, which is its own prolongation. Therefore, with X = l and U = 0, it 
follows that 

pr ⑴ v(L) + LD ■ X = + LD x X =0 + 0 = 0. B 

30.2.4. Example. As a less trivial case, consider the proper time of Example 30.1.15. 
Lorentz transformations generated by 3 v = ud x -1- xd u are symmetries of that variational 


3 In order to avoid confusion in applying formula (30.20), we use jc (instead of t) as the independent variable and u (instead 
of x) as the dependent variable. 


990 30. CALCULUS OF VARIATIONS, SYMMETRIES, AND CONSERVATION LAWS 


Symmetries of the 
Euler-Lagrange 
equations are not 
necessarily the 
symmetries of the 
corresponding 
variational problem! 


problem. We can verify this by noting that the first prolongation of v is, as the reader is 
urged to verify, 

pr(”v = v+(l - 

ou x 

Therefore, 

pr ⑴ v(L) = 0 + 0+((l- 々 )) 圣 ( 一 2 心 ） = -u x yj\-ul. 

On the other hand, since X = u and U = x, 

LD x (X) = yjl-ul D x (u) = yl-wlMjc, 

so that Equation (30.20) is satisfied. M 

In the last chapter, we studied the symmetries of the DEs in some detail. This 
chapter introduces us to a particular DE that arises from a variational problem, 
namely, the Euler-Lagrange equation. The natural question to ask now is, How 
does the variational symmetry manifest itself in the Euler-Lagrange equation? 
Barring some technical difficulties, we note that for any change of variables, if 
u — f{x) is an extremal of the variational problem L[u], then u = f(x) is an 
extremal of the variational problem L[u\. In particular, if the change is achieved by 
the action of the variational symmetry group, (x 9 u) = (x,u) for some g € G, 

then L[m] = L[w], and g • / is also an extremal of L. We thus have 

30.2.5, Theorem. If G is the variational symmetry group of a functional, then G 
is also the symmetry group of the associated Euler-Lagrange equations. 

The converse is not true! There are symmetry groups of the Euler-Lagrange 
equations that are not the symmetry group of the variational problem. Problem 
30.8 illustrates this for /? = 3, ^ = 1, and the functional 

L[w] — ^ JJJ (m] ~ M y) (30.21) 

whose Euler-Lagrange equation is the wave equation. The reader is asked to show 
that while the rotations and Lorentz boosts of Table 29.3 are variational symmetries, 
the dilatations and inversions (special conformal transformations) are not. 

We now treat the case of p = l — q, whose Euler-Lagrange equation is 
an ODE. Recall that the knowledge of a symmetry group of an ODE led to a 
reduction in the order of that ODE. Let us see what happens in the present case. 
Suppose v = X3 x H- Ud u is the infinitesimal generator of a 1-parameter group 
of variational symmetries of L. By an appropriate coordinate transformation from 
(x, u) to (y t w)，as in Section 29.5, v will reduce to dldw, whose prolongation 
is also d/dw. In terms of the new coordinates, Equation (30.20) will reduce to 



30.2 SYMMETRY GROUPS OF VARIATIONAL PROBLEMS 991 


dL/dw = 0; i_e” the new Lagrangian is independent of iu, and the Euler-Lagrange 
equation (30.14) becomes 


3L 


0 = E(L) = J2(-l) j D J y — = (~D y ) 

7 = 1 dW J 


n- 


dL 




(30.22) 


Therefore, the expression in the brackets is some constant X (because D y is a total 
derivative). Furthermore, if we introduce v — w y , the expression in the brackets 
becomes the Euler-Lagrange equation of the variational problem 


t[v] - f L(y 9 


where L(y, v (n ~ l) ) = L(y, … ， w n ), 


and every solution w = f(y) of the original (2n)th-order Euler-Lagrange equation 
corresponds to the (2n — 2)nd-order equation . 


.. dL ^ . dL 

E ( L ) = ^： + D - 巧) qZ 


X. 


(30.23) 


Moreover, this equation can be written as the Euler-Lagrange equation for 
U[v] = jlL(y, v (n ~ l) ) - Xv] dy ： 

Lagrange multiplier and X can be thought of as a Lagrange multiplier, so that in analogy with the 

multivariable extremal problem， 4 the extremization of [ 入 [v] becomes equivalent 
to that of L[v] subject to the constraint / vdy = 0. We summarize the foregoing 
discussion in the following theorem. 

30.2.6. Theorem. Let p = 1 = q t and L[m] an nth-order variational problem 
with a l-parameter group of variational symmetries G. Then there exists a one- 
parameter family of variational problems C^[u] of order n — 1 such that every 
solution of the Euler-Lagrange equation for L[«] can be found by integrating the 
solutions to the Euler-Lagrange equation for Lx[v]. 

Thus, we have the following important result: 


30.2.7. Box. A l-parameter variational symmetry of a functional reduces 
the order of the corresponding Euler—Lagrange equation by two. 


This conclusion is to be contrasted with the symmetry of ODEs, where each 1- 
parameter group of symmetries reduces the order of the ODE by 1. It follows from 


^See [Math 70, pp. 331—341] fora discussion of Lagrange multipliers and their use in variational techniques, especially those 
used in approximating solutions of the Schrodinger equation. 



992 30. CALCULUS OF VARIATIONS, SYMMETRIES, AND CONSERVATION LAWS 


current density and 
conservation law 


constant of the 
motion, or first 
integral of a system 
of ODEs 


trivial conservation 
law of the first kind 


Box 30.2.7 that the ODEs of order 2n derived from a variational problem — the 
Euler-Lagrange equation — are special. 


30.2.8. Example. A first-order variational problem with a 1-parameter group of symme¬ 
tries can be integrated out. By transforming to a new coordinate system, we can always 
assume that the Lagrangian is independent of the dependent variable (see Proposition 
29.5.1). The Euler-Lagrange equation in this case becomes 


0 = E(L) 


3L 

3u 

=0 


D x 


dL 

dUr. 




dL 


Bu 


(x t Mje)= 入 . 


X 


Solving this implicit relation, we get u x = F(x,\), which can be integrated to give m as a 
function of x (and A.). ■ 


The procedure can be generalized to 卜 parameter symmetry groups, but the 
order cannot be expected to be reduced by 2 unless the group is abelian. We shall 
not pursue this matter here, but ask the reader to refer to Problem 30.9. 


30.3 Conservation Laws and Noether’s Theorem 

A conserved physical quantity is generally defined as a quantity whose flux through 
any arbitrary closed surface is equal to (the negative oQ the rate of depletion 
of the quantity in the volume enclosed. This statement, through the use of the 
divergence theorem, translates into a relation connecting the time rate of change of 
the density and the divergence of the current corresponding to the physical quantity. 
Treating time and space coordinates as independent variables and extending to p 
independent variables, we have the following: 


30.3.1. Definition. A conservation law for a system of differential equations 
A(x, m ⑻） = 0 is a divergence expression D • J = 0 valid for all solutions u = f(x) 
of the system. Here ， 

J = (J\(x 9 M ⑻）， J 2 (x 9 m ⑻)， ■ ■ _ ， Jp(x f w ⑻ )) 
is called current density. 

For p = 1 = q ， i.e_，for a system of ODEs，a conservation law takes the 
form D x J(x, u^) = 0 for all solutions u = f(x) of the system. This requires 
J(x, m ⑻） to be a constant, i.e .， tha.tJ(x, m ⑻） be a constant of the motion, or, as 
it is sometimes called, the first integral of the system. 

In order to understand conservation laws, we need to get a handle on those 
conservation laws that are trivially satisfied. 


30.3.2. Definition* If the current density J itself vanishes for all solutions u = 
f(x) of the system A(j:, m ⑻ ）= 0, then D- J = 0 is called a trivial conservation 
law of the first kind. 


30.3 CONSERVATION LAWS AND NOETHERS THEOREM 993 


trivial conservation 
law of the second 
kind; null divergence 


trivial conservation 
law and the 
equivalence of two 
conservation laws 


To eliminate this kind of triviality, one solves the system and its prolongations 
△ ⑻ (x ，m ⑻ ）= 0 for some of the variables u°j in terms of the remaining variables 
and substitutes the latter whenever they occur. For example, one can differentiate 
the evolution equation u t = F(x, w( n )) — in which w( w ) have derivatives with re¬ 
spect to x only — with respect to t andx sufficient number of times (this is what is 
meant by “prolongation” of the system of equations) and solve for all derivatives of 
u involving time. Then, in the conservation law, substitute for any such derivatives 
to obtain a conservation law involving only x derivatives of u. 

30.3.3. Example. The current density Ji — -f —utu x ) is easily seen to be 
conserved for the system of first-order DEs 

Ut = V Xt u x = Ilf. 

By eliminating all the time derivatives in J], we obtain J 2 = (^u^. + 士 vf ， —u x v x ) f which 
is also conserved. However, the difference between these two currents, 

J = Jl - J 2 = u x v x - UfUx), 

satisfies a trivial conservation law of the first kind, because the components of J vanish on 
the solutions of the system. 0 

3D.3.4. Definition. If the current density J satisfies D • J = 0 for all functions 
u = f{x), even if they are not solutions of the system of DEs, the divergence 
identity is called a trivial conservation law of the second kind. In this case J is 
called a null divergence. 

If we treat Ji as the components of a (p — l)-form u, so that the exterior 
derivative du is the divergence of J (times a volume element), then the triviality of 
the conservation law for J is equivalent to the fact that a; is closed. By the converse 
of the Poincare lemma, there must be a (p — 2)-form r\ such that u = dr}. In the 
context of this chapter, we have the following theorem. 

30.3.5. Theorem. Suppose J = (/i(a ：， m ⑻ ) ， … ， / p (x ， m ⑻ )） is a p-tuple of 
functions onX x U^ n \ Then J is a null divergence if and only if there exist smooth 
functions Akj(x, m ⑻）， j，k = 1, … ， p，antisymmetric in their indices, such that 

p 

4 = [ DjA k j ， j = 1 . p. (30.24) 

7=1 

30.3.6. Definition. We say that D • J = 0 is a trivial conservation law if there 
exist antisymmetric smooth functions m ⑻） satisfying Equation (30.24) for 
all solutions of the system of DEs A(jc, = 0. Two conservation laws are 
equivalent if they differ by a trivial conservation law. 

We shall not distinguish between conservation laws that are equivalent. It turns 
out that to within this equivalence, some systems of DEs have current densities J 


994 30. CALCULUS OF VARIATIONS, SYMMETRIES, AND CONSERVATION LAWS 


characteristic form of 
a conservation law 


such that 

i 

D 3 = y^ Q v A v for some /-tuple Q = (gi ， … ， 0z )， （ 30.25) 

V=1 

where {Q v } are smooth functions of jc, u 9 and all derivatives of m. 

30.3.7. Definition. Equation (3025) is called the characteristic form of the con¬ 
servation law for the current density J, and the l-tuple Q, the characteristic of the 
conservation law. 

We are now in a position to prove the celebrated Noether’s theorem. However ， 
we first need a lemma. 

30.3.8. Lemma. Let v = X!f=i ^3/3^ + L«=i U a S/du a where X 1 and U a 
are functions of x and w. Let 

p 

Q a (x t m ⑴） e U a (x, u) - X l (x, u)uf, a — 1,... ,q. 

/ =1 

Then 

p 

pr (rt) v _ p r ⑻ V g -f Di, (30.26) 

i=l 

where 

pr (B) v e 

a=l ce=l J J 

The sum over J extends over all multi-indices with 0 < |/| < n f with the |J| = 0 
term being simply \q. 

Proof. Substitute Q a in the definition of Uj as given in Theorem 29.3.5 to obtain 


Uj = DjQ a ^ xiu h^ 

/=1 

where Uq = Q a - Ef =1 X l uf = U a . It follows that (with *7 = 0 included in the 
sum) 


P「 ㈨ v = ^ 


dx l 


E 


i: 

a 


[ 心 : 


a=l J 


-Di by Proposition 29.3.4 


and the lemma is proved. 


□ 


30,3 CONSERVATION LAWS AND NOETHER’S THEOREM 


The celebrated 
Noether’s theorem 
connecting 
symmetries to 
conservation laws 


30.3.9. Theorem. (Noether’s theorem) Let 

p 殳 

v = E x^/dx^J^ u a d/du a 

1=1 Qf=l 

be the infinitesimal generator of a local 1-parameter group of symmetries G ofthi 
variational problem L[m] = f L(x, m ⑻） d p x. Let 

P fljjCH 

Q a (x, M ⑴） = U a (x, u) ― X l (x, u)uf 9 uj = —— r. 

Then there exists a p-tuple J = (J 1? such that 


O-3 = Y,Q a E 0l (L) 


(30.27) 


is a conservation law in characteristic form for the Euler-Lagrange equation 
E a (L) = 0. 

Proof. We use Lemma 30.3.8 in the infinitesimal criterion of the variational sym¬ 
metry (30.20) to obtain 

0 = pr (n) v(L) + LDX 


pr (n) \ q(L) -^^rDiL + L^DiX 1 


(30.28) 


if 

=pr (n) v e (L) + ^ DiiLT) = pr ■⑻ + D • (LX). 

i=l 

Using the definition of pr ⑻ vg and the identity 
(DjS)T = Dj(ST)- SDjT, 

we can commute Dj = Dj { … Dj k past Q a one factor at a time, each time 
introducing a divergence. Therefore, 


pr^v 0 (L) = ^Z) / Q a 




=jQ a E a (L) + D.A, 

Qf=l 

where A = (A \,..., A p ) is some ^-tuple of functions depending on L, the g a ， s, 
and their derivatives, whose precise form is not needed here. Combining this with 
Equation (30.28), we obtain 

0 = ^ Q a E a (L) + D • (A + LX). 

a=l 

Selecting J = — (A + LX) proves the theorem. 


□ 



CALCULUS OF VARIATIONS, SYMMETRIES, AND CONSERVATION LAWS 


Amalie Emmy Noether (1882-1935), generally considered the 
greatest of all female mathematicians up to her time, was the el¬ 
dest child of Max Noether, research mathematician and professor 
at the University of Erlangen, and Ida Amalia Kaufmann. Two 
of Emmy’s three brothers were also scientists. Alfred, her junior 
by a year, earned a doctorate in chemistry at Erlangen. Fritz, two 
and a half years younger, became a distinguished physicist; and 
his son, Gottfried, became a mathematician. 

At first Emmy Noether had planned to be a teacher of En¬ 
glish and French. From 1900 to 1902 she studied mathematics 
and foreign languages at Erlangen. Then in 1903 she started her specialization in mathemat¬ 
ics at the University of Gottingen. At both universities she was a nonmatriculated auditor at 
lectures, since at the turn of the century women could not be admitted as regular students. In 
1904 she was permitted to matriculate at the University of Erlangen, which granted her the 
Ph.D_，summa cum laude, in 1907. Her sponsor, the algebraist Gordan, strongly influenced 
her doctoral dissertation bn algebraic invariants. Her divergence from Gordan’s viewpoint 
and her progress in the direction of the “new” algebra first began when she was exposed to 
the ideas of Ernst Fischer, who came to Erlangen in 1911- 

In 1915 Hilbert invited Emmy Noether to Gottingen. There she lectured at courses 
that were given under his name and applied her profound invariant-theoretic knowledge 
to the resolution of problems that he and Felix Klein were considering. Inspired by Hilbert 
and Klein’s investigation into Einstein^ general theory of relativity, Noether wrote her 
remarkable 1918 paper in which both the concept of variational symmetry and its connection 
with conservation laws were set down in complete generality. 

Hilbert repeatedly tried to obtain her an appointment as Privatdozent, but the strong 
prejudice against women prevented her habilitation until 1919. In 1922 she was named a 
nichtbeamteter ausserordentlicher Professor (“unofficial associate professor”), a purely 
honorary position. Subsequently, a modest salary was provided through a Lehrauftrag 
(^teaching appointment”）in algebra. Thus she taught at Gottingen (1922—1933), inter¬ 
rupted only by visiting professorships at Moscow (1928-1929) and at Frankfurt (summer 
of 1930). 

In April 1933 she and other Jewish professors at Gottingen were summarily dismissed. 
In 1934 Nazi political pressures caused her brother Fritz to resign from his position at 
Breslau and to take up duties at the research institute in Tomsk, Siberia. Through the efforts 
of Hermann Weyl, Emmy Noether was offered a visiting professorship at Bryn Mawr College; 
she departed for the United States in October 1933. Thereafter she lectured and did research 
at Bryn Mawr and at the Institute for Advanced Studies, Princeton, but those activities were 
cut short by her sudden death from complications following surgery. 

Emmy Noether’s most important contributions to mathematics were in the area of 
abstract algebra. One of the traditional postulates of algebra, namely the commutative law of 
multiplication, was relinquished in the earliest example of a generalized algebraic structure, 
e.g., in Hamilton’s quaternion algebra and also in many of the 1844 Grassmann algebras. 
From 1927 to 1929 Emmy Noether contributed notably to the theory of representations, 
the object of which is to provide realizations of noncommutative algebras by means of 
matrices, or linear transformations. From 1932 to 1934 she was able to probe profoundly 









30.4 APPLICATION TO CLASSICAL FIELD THEORY 997 


into the structure of noncommutative algebras by means of her concept of the verschrdnktes 
(“cross” ） product. 

Emmy Noether wrote some forty-live research papers and was an inspiration to many 
future mathematicians. The so-called Noether school included such algebraists as Hasse 
and W. Schmeidler, with whom she exchanged ideas and whom she converted to her own 
special point of view. She was particularly influential in the work of B. L. van der Waerden, 
who continued to promote her ideas after her death and to indicate the many concepts for 
which he was indebted to her. 


30.4 Application to Classical Field Theory 

It is clear from the proof of Noether’s theorem that if we are interested in the 
conserved current, we need to find A. In general, the expression for A is very 
complicated. However, if the variational problem is of first order (which in most 
cases of physical interest it is), then we can easily find the explicit form of A, 
and, consequently the conserved current J. We leave it for the reader to prove the 
following: 

30.4.1. Corollary. Let v = + 5Za=i U a d/du a be the infinitesi¬ 

mal generator of a local 1-parameter group of symmetries G of the first-order 
variational problem L[m] = / L(x, u^)d p x. Then 5 


j i=hj2 xju i 

a=l j=l 


dL 

du? 


dL 




a- 


l —— 1 9 ) p 


form the components of a conserved current for the Euler-Lagrange equation 
E a (L) = 0. 

This corollary can be applied to most DEs in physics derivable from a La- 
grangian. We are interested in partial DEs studied in classical field theories. The 
case of ODEs, studied in point mechanics, is relegated to Problem (30.11). 

First consider spacetime translation \ l = r] lj dj, where we have introduced 
the Lorentz metric rj lJ to include non-Euclidean cases. In order for v 1 to be an 
infinitesimal variational symmetry, it has to satisfy Equation (30.20), which in the 
case at hand, reduces to \ l (L) = 0, or 9/L = 0. 


30.4.2. Box, In order for a variational problem to be invariant under space¬ 
time translations、its Lagrangian must not depend explicitly on the coordi¬ 
nates. 


5 We have multiplied Ji by a negative sign to conform to physicists’ convention. 






30. CALCULUS OF VARIATIONS, SYMMETRIES, AND CONSERVATION LAWS 


energy momentum 
current density 


If spacetime translation happens to be a symmetry, then X 1 d and the 
(double-indexed) conserved current, derived from Corollary 30.4.1, takes the form 

^ Su a BL iiT 

dxj duf 

a—l J 1 

Using Greek indices to describe space-time coordinates, and Latin indices to label 
the components of R q , we write 




rTL ， 


(30.29) 


where we changed the dependent variable m to 0 to adhere to the notation used 
in the physics literature. Recall that <f>i = d(j) J /dx v . is called the energy 

momentum current density. 

The quantity T^ v 9 having a vanishing divergence, is really a density, just as the 
continuity equation (vanishing of the divergence) for the electric charge involves 
the electric charge and current densities. In the electric case, we find the charge 
by integrating the charge density, the zeroth component of the electric 4-current 
density. Similarly, we find the “charge” associated with T^ v by integrating its 
zeroth component. This yields the energy momentum 4 vector: 


T 0v d 3 x, 


We note that 



where we have used the three-dimensional divergence theorem. By taking S to be 
infinite, and assuming that T iv 0 at infinity (faster than the element of area da\ 
diverges), we obtain d 尸 v / 沿 = 0, the conservation of the 4-momentum. 

30.4.3. Example. A relativistic scalar field of mass m is a 1-component field satisfying 
the Klein-Gordan equation, which is, as the reader may check, the Euler-Lagrange equation 
of 

L[0] = f 

The energy momentum current for the scalar field is found to be 

T^ v = d 蚪 d v 中- 0 V L ， dM 三 tf v 备. 



30.4 APPLICATION TO CLASSICAL FIELD THEORY 


Note that T^ v is symmetric under interchange of its indices. This is a desired feature of the 
energy momentum current that holds for the scalar field but is not satisfied in general, as 
Equation (30.29) indicates. The reader is urged to show directly that = 0 = d v T^ v , 
i.e., that energy momentum is conserved. 匾 


Togo beyond translation, we consider classical (nonquantized) fields 6 {(p J ， 
which, as is the case in most physical situations, transform among themselves as 
the rows of the ath irreducible representation of a Lie group G that acts on the 
independent variables. Under these circumstances, the generators of the symmetry 
are given by Equation (27.59): 


兑 7(0 = 笱 W ( x ) 会 + 〜 X v (x ; 。士， 


(30.30) 


where v labels the independent variables. Corollary 30.4.1 now gives the conserved 


current as 


卜 (x; 減⑻為 - X^OLls^- # ⑻盖 屯) 


where summation over repeated indices is understood with I < k < n a and 
1 < v < /?. We can rewrite this equation in the form 


， ={ x ^( x ; - P ( x ; 沉} 1 - 矿 ⑻券妒 )(幻， 


(30.31) 


where and 5^(^) aicn a x n a matrices whose elements are //J and 
respectively, and 1 is the unit matrix of the same dimension. 

We note that the conserved current has a coordinate part (the term that includes 
and multiplies the unit matrix), and an “intrinsic” part (the term with no X 弘） 
represented by the term involving If the field has only one component 

(a scalar field), then X ⑻ (《）= 0, and only the coordinate part contributes to the 
current. 

The current acquires an extra index when a component of ^ is chosen. As a 
concrete example, consider the case where G is the rotation group in Then a 

typical component of ^ will be ^ P<J , corresponding to a rotation in the -plane, 
and the current will be written as J 卩;叫. These extra indices are also reflected in 
X^ 9 as that too is a function of 


P(x;d 


x f) d a -x (J d p ^ X^(x; ^ pa ) = x p 8 fl(7 - x ff S^. 


orbital angular The volume integral of J 0;pa will give the components of angular momentum, 
momentum and when integrated, the term multiplying 1 becomes the orbital angular momentum, 
intrinsic spin anc j the remaining term gives the intrinsic spin. The conservation of J 叫户 0 " is the 


6 The reader notes that the superscript a, which labeled components of the independent variable m, is now the label of the 
irreducible representation. The components of the dependent variable (now denoted by 0) are labeled by j. 


CALCULUS OF VARIATIONS ， SYMMETRIES, AND CONSERVATION LAWS 


statement of the conservation of total angular momentum. The label a denotes 
various representations of the rotation group. If p = 3, then a is simply the value 
of the spin. For example, the spin-| representation corresponds toa = ^, and 

X (1/2) (0 = or ^ (1/2) (^) = fl = l,2,3, 

with a labeling the three different “directions” of rotation . 7 If the field is a scalar, 
x ⑹⑹ = = 0 , and the field has only an orbital angular momentum. 

30.5 Problems 

30.1. Show that the derivative of a linear map from one Hilbert space to another 
is the map itself. 

30.2. Show that a complex function / : C 〕 口 — C considered as a map 

f :R 2 D R 2 is differentiable iff it satisfies the Cauchy-Riemann conditions. 

Hint: Consider the Jacobian matrix of /, and note that a linear complex map 
T : C C is necessarily of the form T(^) = Xz for some constant X eC. 

30.3. Show that 
6E ^ l[u l(x) = -a/5(x-y). 

ou 

30.4. Show that the first functional derivative of L[w] = /^ 2 y/l H- wf dx, obtained 
using Equation (30.9)，is E (L). 

30.5. Show that for the proper time of special relativity 

8L[x] _ x ss 
Sx(s) — (1 - xj) 3 ^ 2 ' 

Use this to show that the contribution of the second variational derivative to the 
Taylor expansion of the functional is always negative. 

30.6. Show that the first prolongation of the Lorentz generator v = ud x - \-xB u is 

pr (1) v = v + (l-^)_L. 

30.7. Verify that rotation in the x w-plane is a symmetry of the arc-length variational 
problem (see Example 30.1.13). 

30.8. Show that V 4 , vg, and V 7 of Table 29.3 are variational symmetries of Equation 
(30.21), but V 5 , vs, V 9 , and vio are not. Find the constant c (if it exists) such that 
V 5 + cm 3“ is a variational symmetry. Show that no linear combination of inversions 
produces a symmetry. 


30.5 PROBLEMS 10D1 


Kepler problem 30.9. The two-dimensional Kepler problem (for a unit point mass) starts with 
the functional 

L = j + yf) - V(r)] dt, r = 」 x 2 + 

(a) Show that L is invariant under t translation and rotation in the xy-plane. 

(b) Find the generators of t translation and rotation in polar coordinates and con¬ 
clude that r is the best choice for the independent variable. 

(c) Rewrite L in polar coordinates and show that it is independent of t and 沒 . 

(d) Write the Euler-Lagrange equations and integrate them to get ^ as an integral 
over r. 

30.16. Prove Corollary 30.4.1. 

30*11. donsidcr a system of iV particles whose total kinetic energy K and potential 
energy U are given by 

N 

K ⑻ =U(t, x) = -x^T 1 , 

a=l a 种 

where x® = (x a , y a ^ z a ) is the position of the ath particle. The variational problem 
is of the form 

/OO noo 

L(t,x,x)dt= / [K(x)-U(t,x)] dt. 

-OO J—OO 

(a) Show that the Euler-Lagrange equations are identical to Newton’s second law 
of motion. 

(b) Write the infinitesimal criterion for the vector field 

v = r(f, X )A + ^[^ (/) x)^+^(r, x)~ + x)^] 

to be the generator of a 1-parameter group of variational symmetries of L. 

(c) Show that the conserved “current” derived from Corollary 30.4.1 is 

N 

T = (?" 尸 + n a r + ^ a z a ) -rE, 

a=\ 

where + t/ is the total energy of the system. 

(d) Find the conditions on Z7 such that (i) time translation, (ii) space translations, 
and (iii) rotations become symmetries of L. In each case, compute the correspond¬ 
ing conserved quantity. 


7 Only in three dimensions can one label rotations with a single index. This is because each coordinate plane has a unique 
direction (by the use of the right-hand rule) perpendicular to it that can be identified as the direction of rotation. 


CALCULUS OF VARIATIONS, SYMMETRIES, AND CONSERVATION LAWS 


30.12. Show that the Euler-Lagrange equation of 


L[0] = j = J - m 2 (j) 2 ] d A x 


is the Klein-Gordan equation. Verify that T^ v = are the cur¬ 

rents associated with the invariance under translations. Show directly that T^ v is 
conserved. 


Additional Reading 

The first three books cited below discuss what we have not covered in our study of 
Lie groups and DEs, namely “generalized symmetries^ whereby the generators of 
symmetries are allowed to be not only functions of the independent and dependent 
variables, but also of the derivatives of the dependent variables. This leads to a 
more general version of Noether’s theorem than presented in this chapter. 

1. Bluman, G. andKumei,S. Symmetries and Differential Equations, Springer- 
Verlag, 1989. 

2. Olver, P. Application of Lie Groups to Differential Equations, Springer- 
Verlag ， 1986. 

3. Stephani, H. Differential Equations: Their solutions using symmetries, Cam¬ 
bridge University Press, 1989. 

4. All books on relativistic quantum field theories have a discussion of symme¬ 
tries and conservation laws. See, for example, Weinberg, S. The Quantum 
Theory of Fields (2 volumes), Cambridge University Press ， 1995. 


iography 


[Abra 85] Abraham, R. and Marsden, J. Foundations of Mechanics, 2nd ed., Addison- 
Wesley, 1985. 

[Abra 88] Abraham, R., Marsden, J r ., and Ratiu, T. Manifolds, Tensor Analysis, and Appli¬ 
cations, 2nd ed., Springer-Verlag, 1988. 

[Axle 96] Axler, S. Linear Algebra Done Right, Springer-Verlag, 1996. 

[Barn 86] Barut» A. and Raczka, R. Theory of Group Representations and Applications, 
World Scientific, 1986. 

[Birk 77] Birkhoff, G. and MacLane, S. Modern Algebra, 4th ed., Macmillan, 1977. 

[Birk 78] Birkhoff, G. and Rota, G.-C. Ordinaiy Differential Equations, 3rd ed., Wiley, 
1978. 

[Bish 80] Bishop, R. and Goldberg, S. Tensor Analysis on Manifolds ， Dover, 1980. 

[Blum 89] Bluman, G. and Kumei, S, Symmetries and Differential Equations, Springer- 
Verlag, 1989. 

[Bocc 90] Boccara, N. Functional Analysis, Academic Press, 1990. 

[Boer 63] Boemer, H. Representation of Groups, North-Holland, 1963. 

[Bott 82] Bott, R. and L. Differential Forms in Algebraic Topology, Springer-Verlag, 
1982. 

[Chev 46] Chevalley, C. Theory of Lie Groups ， Princeton University Press, 1946. 

[Choq 82] Choquet-Bruhat, Y., DeWitt-Morette, C., and Dillard-Bleick, M. Analysis ， Man¬ 
ifolds, and Physics ， 2nd ed” North-Holland, 1982. 

[Chur 74] Churchill, R. and Yerhey, R. Complex Variables and Applications, 3rd ed., 
McGraw-Hill, 1974. 

[Cour 62] Courant, R. and Hilbert, D. Methods of Mathematical Physics, vol. 1, Inter¬ 
science, 1962. 


1004 BIBLIOGRAPHY 


Penn 67] Dennery, P., and Krzy wicki, A. Mathematics for Physicists, Harper and Row, 
1967. 

[De\^ 90] DeVito, C. Functional Analysis and Linear Operator Theory, Addison-Wesley ， 
1990. 

[DeVr 94] DeVries, P. A First Course in Computational Physics ， Wiley, 1994. 

[Econ 83] Economou, E. Green’s Functions in Quantum Physics ， SpringerVerlag, 1983. 

[Elli 79] Elliott ， J” and Dawber, P. Symmetry in Physics (2 volumes), Oxford University 
Press, 1979. 

[Flan 89] Flanders, H. Differential Forms with Applications to Physical Sciences, Dover, 
1989. 

[Foil 95] Folland, G. Introduction to Partial Differential Equations, 2nd ed” Princeton 
University Press, 1995. 

[Frie 82] Friedman, A. Foundations of Modem Analysis, Dover, 1982. 

[Fult 91] Fulton, W, and Harris, J. Representation Theory, Springer-Veriag ， 1991. 

[Gill 72] Gillispie, C., ed. Dictionary of Scientific Biography, Charles Scribner’s，New 
York ， 1970. 

[Gilm 74] Gilmore, R. Lie Groups，Lie Algebras, and Some of Their Applications ， Wiley, 
1974. 

[Glim 87] Glimm, J. and Jaffe, A. Quantum Physics, 2nd ed., Springer-Verlag, 1987. 

[Grad 65] Gradshteyn ， I. and Ryzhik, I. Table of Integrals, Series, and Products, Academic 
Press, 1965. 

[Greu 75] Greub, W. Linear Algebra, 4th ed. t Springer-Verlag, 1975. 

[Halm 58] Halmos, P. Finite Dimensional Vector Spaces, 2nd ed” Van Nostrand, 1958. 

[Halm 74] Halmos, P. Naive Set Theory, Springer-Veriag ， 1974. 

[Hame 89] Hamermesh,M. Group Theory and Its Application to Physical Problems ， Dover, 
1989. 

[Hass 00] Hassani, S. Mathematical Methods for Students of Physics and Related Fields ， 
Springer-Verlag, 2000. 

[Hell 67] Hellwig, G. Differential Operators of Mathematical Physics, Addison-Wesley, 
1967. 

[Hild 87] Hildebrand, F. Introduction to Numerical Analysis, 2nd ed., Dover, 1987. 

[Hill 87] Hildebrand, R Statistical Mechanics ， Dover, 1987. 

[Jack 75] Jackson, J. Classical Electrodynamics, 2nd ed., Wiley, 1975. 

[Jorg 82] Jorgen, K. Linear Integral Operators ， Pitman, 1982. 

[Kell 85] Kelley, J. General Topology ， Springer-Verlag ， 1985. 

[Koba 63] Kobayashi, S. and Noraizu, K. Foundations of Differential Geometry, vol. I, 
Wiley ， 1963. 

[Lang 85] Lang, S. Complex Analysis, 2nd ed. ， Springer-Verlag, 1985. 


BIBLIOGRAPHY 1005 


[Lorr 88] Loirain, P., Corson D” and Lorrain, F. Electromagnetic Fields and Waves, 3rd 
ed., W. H. Freeman, 1988. 

[Mack 68] Mackey, G. Induced Representations, Benjamin, 1968. 

[Mad 80] Marion, J. and Heald, M. Classical Electromagnetic Radiation, 2nd ed., Aca¬ 
demic Press, 19B0. 

[Math 70] Mathews, J. and Walker, R. Mathematical Methods of Physics, 2nd ed” Ben¬ 
jamin, 1970. 

[Mess 66] Messiah, A. Quantum Mechanics (2 volumes), Wiley, 1966. 

[Mill 68] Miller, W. Lie Theory and Special Functions, Academic Press, 1968. 

[Misn 73] Misner, C. ( Thome, K., and Wheeler, J. Gravitation, Freeman, 1973. 

[Mors 53] Morse, P. and Feshbach, M. Methods of Theoretical Physics, McGraw-Hill, 
1953. 

[Naka 90] Nakahara, M. Geometry ， Topology, and Physics, Adam Hilger, 1990. 

[Olve 86] Olver, P. Application of Lie Groups to Differential Equations, Springer-Verlag ， 
1986. 

[Reed 80] Reed, M, and Simon, B. Functional Analysis (4 volumes), Academic Press, 
1980. 

[Rich 78] Richtmyer, R. Principles of Advanced Mathematical Physics ， Springer-Verlag ， 
1978. ^ 

[Roac 70] Roach, G. Green’s Functions, Van Nostrand, 1970. 

[Rotm 84] Rotman, J. An Introduction to the Theory of Groups, 3rd ed” Allyn and Bacon, 
1984. 

[Rudi 74] Rudin, W. Functional Analysis, McGraw-Hill, 1991. 

[Saun 89] Saunders, D. The Geometry of Jet Bundles, Cambridge University Press, 1989. 

[Sham 83] Simmons, G. Introduction to Topology and Modem Analysis, Krieger, 1983, 

[Sinrni 92] Simmons, G. Calculus Gems 、 McGraw-Hill, 1992. 

[Stak 79] Stakgold, I. Green’s Functions and Boundary Value Problems ， Wiley, 1979. 

[Step 89] Stephani, H. Differential Equations: Their solutions using symmetries ， Cam¬ 
bridge University Press, 1989. 

[Trie 55] Tricomi, F. Vorlesungen iiber Orthogonalreihen, Springer, 1955. 

[Vara 84] Varadarajan, V. Lie Groups，Lie Algebras and Their Representations t Springer- 
Verlag, 1984. 

[Wald 84] Wald, R. General Relativity, University of Chicago Press, 1984. 

[Warn 83] Warner, F. Foundations of Differentiable Manifolds and Lie Groups, Springer- 
Verlag, 1983. • 

[Wats 52] Watson, G. A Treatise on the Theory of Bessel Functions^ 2nd ed., Cambridge 
University Press, 1952. 

[Wein 95] Weinberg, S. The Quantum Theory of Fields (2 volumes), Cambridge University 
Press, 1995. 

[Wign 59] Wigner, E. Group Theory, Academic Press, 1959. 

[Zeid 95] Zeidler, E. Applied Functional Analysis, Springer-Verlag, 1995. 




Abel, 63,177, 183, 233,416,462, 653 
biography, 662 
abelian group, 654 
abelian Lie algebra, 834 
absolute convergence, 252 
additive identity, 20 
adjoint 

formal, 560 
adjoint action, 827 
adjoint algebra, 840 
adjoint BC, 563 
adjoint DO, 364-367 
adjoint Green’s ftmction，566 
adjoint map, 825 
adjoint of a matrix, 87 
adjoint of an operator, 61 
adjoint representation, 679 
character, 684 
affine group, 876 
algebra, 41-45 
definition, 41 
derivation of an, 42 
dimension of an, 41 
homomorphism, 43 
ideal, 44 
isomorphism, 43 
of operators, 49-76 
quaternions, 44 
structure constants, 43 


symmetric, 737 
tensor, 731 

algebraic equation, 937 

symmetry group of, 937 
analytic continuation, 302-309 
analytic function, 228-235 
definition, 233 
derivative, 228 
derivatives as integrals, 249 
entire, 233 
poles of, 273 
roots (zeros) of, 262 . 
angular momentum, 831,869 
addition theorem, 866 
eigenvalues, 337 
operator, 331 

construction of eigenvalues, 334 
anniMator, 39 
anti-hermitian operator, 63 
anticommutator, 59 
antisymmetric representation, 679 
antisymmetrizer，739 
arc length, 890 

associated Legendre functions, 341 
associative algebra, 41 
atlas, 765 
automorphism, 36 
averaging operator, 71 
azimuthal symmetry, 343 


backward difference operator ， 70 ， 
384-387 

Baker-Campbell-Hausdorff, 59 
Banach space, 148 
basis, 22 

dual of a, 38, 729 
oriented, 745 
orthonormal, 26 
transformation matrix，91 
Becquerel, 799 
Bernoulli, 423,981 
Bessel, 177,620,982 
biography, 422 
Bessel equation 

Liouville substitution, 514 
Bessel function, 423-426 

asymptotic behavior, 443 
large argument, 444 
large order, 443 
confluent hypergeometric, 424 
generating function, 440 
integral representation of, 438 
modified, 425 
second kind, 425 
oscillation of, 363 
second kind，424 
third kind, 425 
Bessel functions 
spherical, 542 
Bessel imaginary (bei), 539 
Bessel inequality, 149 
Bessel real (ber), 539 
beta function, 309-312 
definition, 310 
Bianchi identity, 889 
bijective map, 6 
biliary operation, 7 
binomial theorem, 13 
Birkhoff’s theorem, 925 
Bolzano, 11 

Bolzano-Weierstrass theorem, 461 
Boole, 704 
boundary conditions 
Dirichlet, 591 
homogeneous, 559 
mixed, 564 
Neumann, 591 


periodic, 515 
separated, 510 
unraixed, 564 
boundary functionals, 559 
boundary point, 459 
boundary value problem, 560, 583 
Dirichlet, 591 
Neumann, 591 
numerical solution, 383 
bounded operator, 453-457 
continuity，454 
branch cut, 296 
Brouwer, 850 
bundle of tensors, 786 
BWHB theorem, 853 

canonical basis, 747, 802 

second order linear DE, 404 
canonical coordinates, 802 
canonical transformation, 802, 806 
Cantor ， 463, 739, 799 
biography, 10 
Cantor set, 12 
cardinality, 12 
Cartan, 798, 841,942 
biography, 743 

Cartan metric tensor, 843, 862 
Cartan subalgebra, 845 
Cartan’s lemma, 743 
Cartesian product, 2 

Casimir operator, 861-863, 868-870, 872, 
873 

Cauchy, 96, 271 ， 296, 416,478,528, 588, 
653, 891 
biography, 233 
Cauchy data, 584 
Cauchy integral formula, 245 
for operators, 482 
Cauchy problem, 584 
ill-posed, 590 
Cauchy sequence, 10, 146 
Cauchy-Gonrsat theorem, 242 
Cauchy-Riemann conditions, 230 
differentiability, 232 
Cayley, 704, 744 
center of a group, 658 
central difference operator, 70 


INDEX 1009 


centralizer, 658 
Champollion, 198 
character 

adjoint representation, 684 
compound, 683, 685 
conjugacy class, 684 
group and its subgroup, 692 
of a representation, 683 
simple, 683,685 
character table, 691 
character table 
for S 2i 694 
character table 
for ^ 3 ,694 

characteristic hypersurface ， 585-589 
definition, 586 

characteristic polynomial, 115 
linear DEs, 376 
chart, 764 
Chebyshev, 588 
Chebyshev polynomials, 185 
Chevalley's theorem, 862 
Chevalley, biography, 864 
Christoffel symbol, 890 
Christoffel ， biography, 890 
circuit matrix of a DE, 403 
circular heat-conducting plate, 537 
Clebsch, 178, 704 
biography, 702 
Clebsch-Gordan 

coefficients, 705, 706 
decomposition, 701-705, 721, 866 
series, 703,706 
Clifford, 744 
closed form, 797 
closed subset, 459 
closure, 459 
codomain, 5 
cofactor, 95 

commutative algebra, 41 
commutative group, 654 
commutative Lie algebra, 834 
commutator 

diagonalizability, 124 
commutator subgroup, 658 
commutators, 55 ■ 
compact Lie algebra, 843 


compact Lie group 

representation ， 845-855 
compact operator, 464-466 
definition, 464 
spectrum, 467-473 
compact resolvent, 508 
compact set, 458-463 
compact subset, 461 
compact support, 165n，801 
comparison theorem, 361-363 
complement of a set, 2 
complete o.n. sequence, 149 
completeness relation^ 68, 150 
complex coordinate space, 21 
complex exponential function, 234 
complex FOLDEs, 401-402 
complex function, 227 
integration, 241 
complex GL(V), 822 
complex plane 

contour in the, 242 
curve in the, 242 
multiply connected region, 245 
path in the, 242 
simply connected region, 245 
complex potential, 236 
complex series, 252 
complex SOLDE ， 404-A10 
complex vector space, 20 
composition of maps, 6 
compound character, 685 
conducting cylindrical can, 535 
confluent hypergeometric function 
definition, 420 

integral representation of, 437 
conformal group, 912 
in 2 dimensions, 912 
conformal Killing vector, 911 
conformal map, 236-241 
definition, 237 
translation, 238 
conformal transformation，911 
special, 912 
conic sections, 132 
conjugacy class, 661, 684 
conjugate, 661 
conjugate subgroup, 657 


1010 INDEX 

conjugation of operators, 61-63 

cyclic permutation, 666 

conjunct ， 560, 563, 576 

cyclic subgroup, 657 

connection coefficients, 886, 900 


conservation law, 855,992 

d J Alembert, 984 

characteristic, 994 

biography, 330 

trivial, 993 

Darboux, 744 , 942 

trivial of the first kind, 992 

Darboux inequality, 245 

trivial of the second kind, 993 

Darboux theorem, 802 

conserved current density, 992 

de Broglie, 807 

constant of the motion, 992 

Dedekind, 11, 711,739, 891 

constrained systems, 804n 

degeneracy, 697 

continuous index, 159 

energy, 605 

contraction, 735 

degenerate kernels, 501-504 

contravariant degree, 731 

delta function, 160,592 

contravariant tensor, 731 

and step function, 163 

convex subset, 474 

expansion 

convolution theorem, 223 

Fourier, 204 

coordinate curve, 774 

general, 189 

coordinate frame, 774 

limit of sequence, 161,163 

coordinate functions, 764 

potential, 601 

coordinate representation of 820 

dense subset, 460 

coordinate transformation 

density function, 833 

orientation preserving, 800 

density of states, 532 

orientation reversing, 800 

derivation, 840 

coset, 658 

derivation algebra, 840 

cosmological constant, 921 

derivation of an algebra, 42 

Coulomb, 985 

derivation property 

countably infinite set, 12 

tangent vector, 772 

covariant degree, 731 

derivative 

covariant derivative, 897-908 

covariant, 897 

and curvature, 903 

derivative operator 

Lie derivative, 902 

unboundedness of, 455 

covariant tensor, 731 

Descartes, 738 

criterion for irreducibility, 685 

detenninant, 93-101, 744 

crystallography, 675 

analytic definition of, 137 

current density, 992 

connection with trace, 102 

energy momentum, 998 

definition, 93 

curvature 

derivative of, 102 

and gravity, 914 

expansion of, 95 

as relative acceleration, 913 

exponential of trace, 103 

matrix, 888 

of an operator, 101 

two-form, 885 

products of matrices, 97 

curve 

relation to trace, 101 

coordinate, 774 

diffeomorphism, 768 

differentiable, 770 

differentiable curve, 770 

curvilinear coordinates, 892 

tangent vector, 772 

cycle, 666 

differentiable manifold, 763-769 




INDEX 1011 


dimension of a, 764 
differentiable map, 768 

coordinate expression of, 768 
differential 

of a constant map, 777 
of a map, 776 
one-form, 786 
real-valued maps, 778 
differential equation 
analytic, 401 
analytic properties, 401 
associated Legendre, 343 
Bessel, 407 

confluent hypergeometric, 420 
constant coefficients, 376-379 
Euler, 411 
first-order 
existence, 350 
linear, 351 
normal form, 351 
Peano existence theorem, 351 
uniqueness, 350 
uniqueness theorem, 352 
first-order linear 

irregular singular point, 402 
regular singular point, 402 
Fuchsian, 411 
higher-order 

numerical solution, 393 
homogeneous, 349 
hypergeometric, 407 
definition, 413 
Jacobi functions, 418 
Kummer’s solutions, 418 
inhomogeneous, 349 
linear, 349 

superposition principle, 354 
numerical solution, 384 
Adam’s method, 386 
Euler’s method, 386 
Riemann, 412 
second order linear 
behavior at infinity, 410 
canonical basis, 404 
characteristic exponents, 406 
second-order linear 
adjoint, 365 


Frobenius method, 371 
general properties, 352 
integrating factor, 364 
regular, 353 

uniqueness theorem，354 
Wronskian, 356 
symmetry group of, 941-950 
differential form, 791 

Maxwell’s equations, 793 
differential forms 

Lorentz force law, 794 
differential operator, 349, 863 
differentiation operator, 73 
diffusion equation, 528,591, 621 
time-dependent, 530 
diffusion operator 

Green’s function for, 632 
dilation, 238 
dilitation, 912 
dimension theorem, 35 
Dirac, 850 

biography, 167 
Dirac delta function, 160 
direct products, 661-663 
direct sum, 109-111 
definition, 110 
directional derivative, 976 
Dirichlet, 177,738, 888, 891 
biography, 614 

Dirichlet boundary condition, 591 
Dirichlet BVP, 591, 613-619 
in two dimensions, 638 
discrete Fourier transform, 217-219 
dispersion relation 

with one subtraction, 308 
dispersion relations, 306-309 
distribution, 166 
density, 166 
derivative of, 169 
Fourier transform of a, 219 
limit of functions, 168 
divergence, 987 
of tensors, 903 
domain, 5 
dual 

basis, 38,729 
of an operator, 39 




space, 34 

eigenfunction expansion technique, 
636-641 

2D Laplacian，638 
eigenspace, 115 
eigenvalue, 114-116 

characteristic polynomial，115 
definition, 114 
of angular momentum，337 
simple, 115 
eigenvector, 114—116 

angular momentum, 338-346 
definition, 114 
generalized，468 

Einstein, 799, 849, 889, 891, 899, 920, 
921,996 
Einstein tensor, 919 
Einstein’s equation ， 918-921 

Schwarzschild solution, 924 
Einstein's summation convention^ 728 
electromagnetic field tensor, 796, 797 
elementary column operation, 99 
elementary row operation, 99 
elliptic PDE, 589, 613-621 
elsewhere, 838 
empty set, 2 
endomorphism, 33 
energy function, 805 
entire function, 233 
equivalence class, 3 
equivalence relation, 3 
equivalent representations, 674 
essential singularity, 273 
essentially idempotent, 689 
^-orthogonal matrices, 837 
Euclid ， 151， 807 
Euclidean metric, 892 
Euclidean space，754 
Euler, 233, 390,415, 423,514, 888, 984 
biography, 981 
Euler angles, 89, 865 
Euler kernel, 434 
Euler operator* 980 
Euler theorem, 865 
Euler transform, 433 
Euler—Lagrange equations, 981 


Euler-Mascheroni constant, 310 
evaluation function, 977 
event, 837 

evolution operator, 58, 626 
exact form, 797 
expectation value, 64 
exponential function 
complex, 234 
exponential map, 824 
exterior algebra, 739-749 
exterior calculus, 791-801 
exterior derivative, 791 
exterior product, 740 

F-related vector fields, 780 
faithful representation, 674 
Fermi energy, 532 
Feynman diagram, 602 
Feynman propagator, 636 
field, 20 

finite-difference operators, 70-73 
finite-rank operator, 464 
first integral, 992 
first variation, 983 
flat manifold, 895,896 
flat map, 747, 802 
flow, 784 
form factor, 216 
formal adjoint, 560 
formally self-adjoint, 560 
forward difference operator, 70 
Fourier, 614,653, 982 
biography, 198 
Fourier series, 196-208 
angular variable, 197 
fundamental cell, 198 
general variable, 200 
group theory, 853 
higher dimensions, 207 
main theorem, 202 
sawtooth, 201 
square wave, 200 
to Fourier transform, 208-210 
Fourier transform, 208-220, 433 
and derivatives, 216-217 
and GF, 628-636 
and quark model, 216 


INDEX 1013 


Coulomb potential 

charge distribution, 215 
point charge, 214 
definition, 210 
distribution, 219 
Gaussian, 212 
higher dimensions, 213 
Fourier-Bessel series, 536 
Fredholm, 151 
Fredholm alternative, 496 
Fredholm equation, 488,600 
Fredholm, biography, 496 
Friedmann metric, 892 
Friedmann, biography, 921 
Frobenius, 681 ， 850, 874 
biography, 711 
Frobenius method, 370-376 
FuchsianDE, 410-413 
definition, 411 
function, 5 
function algebra, 42 
function of operator, 53, 125-128 
derivative, 56 
functional, 980 
linear, 34 

functional derivative, 977 
functions of trig functions 
integrals of, 281 
fundamental solution ， 597-600 
fundamental theorem of algebra, 251 
future light cone, 837 

G-invariance, 937 
g-orthogonal, 751 
g-orthonormal, 752 
Galois ， 96,711 ， 842, 942 
biography, 653 

gamma function, 181, 309-312 
definition, 309 
gauge invariance, 797 
Gauss, 96, 183, 233, 422,463,478, 614, 
738, 797, 888, 891,981 
biography, 415 
Gay-Lussac, 528 
Gegenbauer functions, 419 
Gegenbauer polynomials, 185 
general linear group, 655 


representation, 856-859 
general relativity, 918 - 932 
generalized eigenvector, 468 
generalized Fourier coefficients，150 
generalized functions, 165-169 
generalized Green’s identity, 561 ， 574, 
597 

generating function, 189 
geodesic, 905-908 

relative acceleration, 913 
geodesic deviation, 913-918 
equation of, 913 
geodesic equation 

massive particles, 926, 927 
massless particles, 926,928 
geometric multiplicity, 115 
Gibbs, 463, 807 
Gibbs phenomenon, 205-207 
GL(n, M) as a Lie group, 816 
GL(V) 

as a Lie group, 816 
representation of, 856 
Godel, 799 
Gordan, 996 

biography, 704 
gradient 

for Hilbert spaces, 976 
gradient operator, 901 
Gram, 28 

Gram-Schmidt process, 27-29 
Gram-Schmidt process, 172 
graph, 5 

Grassmann, 743,996 
Grassmann product, 740 
gravitational red-shift, 930 
gravity 

and curvature, 914 
Green’s function, 289 
adjoint, 566 
advanced, 634 
and Fourier transform, 628 
construcdon, 569 
division operator, 632 
DirichletBC 
circle, 618 
DirichletBVP,613 
eigenfunction expansion, 577-579 




for SOLDO, 565-577 
for the Laplacian, 595-596 
formal considerations, 557 
Helmholtz operator, 630 
in 2D, 643 

in one dimension, 553 
indefinite, 554 
inhomogeneous BCs, 574 
integral equations, 600 
Laplacian, 629 
Neumann BVP, 620 
exterior* 621 
interior, 620 

physical interpretation, 577 
properties, 567 
retarded, 634 
second order DO, 562 
symmetry, 567 
wave equation, 633 
Green’s function 
for d/dx 9 554 
for d 2 /^ 2 ,555 

Green’s identity, 567, 596, 623,627 
Green, biography, 561 
group, 652-656 
abelian, 654 
affine, 876 
algebra, 687 

symmetric group, 718 
automorphism, 655 
center of a, 658 
commutative, 654 
commutator, 657 
homomorphism, 655 
isomorphism, 655 
left action, 663 
multiplication, 652 
multiplication table, 656 
of affine motions, 817 
order of, 652 
orthogonal, 656 
realization, 664 
representation 
definition, 674 

irreducible, projection operator, 
698 

matrix, 675 


particles and fields, 699 
tensor product, 699 
right action, 663 
rigid rotations, 657 
special orthogonal, 657 
special unitary, 657 
symplectic, 657, 749 
unitary, 656 

group action, 663-664, 817-820 
effective, 663 

infinitesimal generator, 827 
transitive, 663, 818 
guided waves 
TE, 533 
TEM, 533 
TM, 533 

Haar measure, 832 
Halley, 422 

Hamilton, 178,490,996 
biography, 806 
Hamiltonian system, 805 
Hamiltonian vector field, B05 
Hankel function, 425 

asymptotic expansion of, 316 
Hankel transform, 434 
harmonic functions, 236 
heat equation, 327, 591， 621 
symmetry group, 957-961 
heat transfer, 528 
steady-state, 528 
time-dependent, 530 
heat-conducting plate, 528 
Hegel, 738 
Heisenberg, 64,167 
helicity, 876 

Helmholtz, 178,588,849 
Helmholtz equation, 542 
Helmholtz operator 

Green’s function for, 630 
Hermite, 183,798 
biography, 63 

Hermite polynomials, 176, 180,511 
hermitian conjugate, 61 
hermitian kernel, 497-501 
hermitian operator, 63-67 
Hilbert, 11, 28,199,463,704, 799, 848, 
920,996 


biography, 151 
Hilbert space, 145-157 
basis，150 
definition, 148 
differential of functions, 976 
functions on 

derivative of, 974 
Hilbert transforms, 308 
Hilbert-Schmidt operator, 465 
Hilbert-Schmidt theorem, 498 
Hilbert-Schmidt kernel, 466,494 
Hilbert-Schmidt operator, 847 
Hodge star operator, 756-758, 795 
Holder, 463 
homogeneous SOLDE 
exact, 364 
second solution, 358 
homographic transformations, 239 
homomorphism 
algebra, 43 
kernel, 658 
Lie group, 816 
symmetric, 655 
trivial, 655 

hyperbolic PDE, 589, 626-628 
hypergeometric DE, 407 
hypergeometric ftmction ， 413-419 
contiguous, 417 
Euler formula, 436 
integral representation of, 434 
hypeigeometric series, 414 
hypersurface, 583 

ideal, 44 r 
idempotent, 688 
essentially, 6B9 
primitive, 6B9 
identity 

additive, 20 
multiplicative, 20 
identity map，5 
identity operator, 50 
identity representation, 674 
ignorable coordinate, 593 
image of a subset, 5 
implicit function theorem, 350 
indicial polynomial, 406 


induced representations, 871 
induction principle, 12 
inductive definition, 14 
inequality 

Bessel, 149 
Darboux, 245 
Parseval, 149 
Schwarz, 29 
triangle, 31 

infinitesimal action, 826-831 
adjoint, 827 

infinitesimal generator, 827, 830 
inhomogeneous BCs ， 574-577 
inhomogeneous SOLDE 
general solution, 360 
initial value problem, 559,583 
numerical solution, 383 
injective map, 6 
inner automorphism, 825 
inner product, 23-32 

bra and ket notation, 25 
definition of, 24 
norm and, 31 
positive definite, 25 
pseudo-Riemarmian, 25 
Riemannian, 25 
sesquilinear, 25 
inner product space, 25 
integrability condition, 888 
integral curve, 783 
integral equation 

characteristic value, 489 
Fredholm, 494-504 
Green’s functions, 600 
kernel of, 488 
of the first kind, 488 
of the second kind, 488 
Volterra, 488 
Volterra, of second kind 
solution, 490 
integral operator, 452 
integral transforms, 433 
Bessel function, 434 
integration on manifolds, 800-801 
integration operator, 73 
interior product, 793 
intersection, 2 


intrinsic spin, 999 
invariant, 937 
invariant map, 937 
invariant operator 

matrix representation, 112 
invariant subspace, 112-114, 676, 677 
definition, 112 
inverse image, 5 
inverse mapping theorem, 111 
inverse of a map, 6 
inverse of a matrix, 97-100 
inversion, 238, 912 
irreducibility criterion, 685 
irreducible representation, 677 
i-th row 

functions, 696 
norm of functions, 696 
irreducible tensor operators， 705-707 
isometric map, 908 
isometry, 908-912 

time translation, 923 
isomorphism，36 
algebra, 43 
Lie algebra，822 
Lie group, 816 

Jacobi, 183,416,490, 615, 663, 702, 704, 
738, 807, 888 
biography, 177 
Jacobi functions, 418 
Jacobi identity, 782,790, 826 
Jacobi polynomials, 176, 184,418 
special cases, 176 
Jacobian matrix, 777 
Jordan arc, 242 
Jordan canonical form, 485 
Jordan’s lemma, 275 

Kant, 738 
Kelvin, 561 
Kelvin equation, 538 
Kepler problem, 1001 
kernel, 35 

degenerate, 501 
integral operator, 452 
integral transforms, 433 
separable, 501 
Killing, 743,942 


biography, 842 
Killing equation, 909 
Killing form, 841, 845 
Killing form 

of 总 卟， M)，844 

Killing vector field， 908-912 9 923, 926, 

^ 930 

conformal, 911 
Kirchhoff, 588 

Klein, 388,743, 798, 849, 891, 920, 942, 
996 

Klein-Gordon equation, 328 
Korteweg-de Vries equation, 971 
Kovalevskaya, 463 
biography, 587 
Kramers-Kronig relation, 309 
Kronecker, 11, 96, 388 
biography, 738 
Kronecker delta, 26 
Kummer, 30, 704, 738, 842, 891, 

Kutta, biography, 390 

Lagrange, 96, 178,182, 198, 199,415, 
423,528,704 
biography, 984 

Lagrange identity, 366, 434, 525, 560 
Lagrange multiplier, 991 
Lagrange theorem, 670 
Lagrangian, 804, 980 
^ nuH, 987 

Laguerre polynomials, 176, 180-181 
Laplace, 28, 199, 614, 806, 920, 982, 985 
biography, 527 
Laplace transform, 433 
Laplace’s equation, 327 

Cartesian coordinates, 526 
cylindrical coordinates, 535 
spherical coordinates, 541 
Laplacian 

Green’s function for, 595, 629 
separation of angular part, 331 
Laurent series, 252-262, 606 
construction, 254 
uniqueness, 258 
Laurent, biography, 271 
Lavoisier, 95,985 
left coset, 658 


INDEX 1017 


left ideal, 44, 689 
minimal, 689 
left translation, 820 
as action, 827 
left-invariant 1-form, 820 
left-invariant vectors, 820 
Legendre, 177,199,490, 614, 888 
biography, 182 
Legendre equation, 517 
Legendre functions, 419 
Legendre polynomials, 155, 182-184, 
340, 359 

and delta function, 188 
and Laplacian, 188 
Legendre transformation, B04 
Leibniz, 95,738 
Leibniz rule, 15 
length of a vector, 30-32 
Levi-Civita, B91 
biography, 898 
Levi-Civita tensor, 745, 869 
Lie, 711,743,798, 842 
biography, 941 
Lie algebra, 833-845 
abelian, 834 
adjoint map, 825 
Cartan metric tensor, 843 
Cartan theorem, 844 
center, 834 
commutative, 834 
compact, 843 
decomposition, B44 
derivation, 840 
ideal, 834 

Killing form of a, 841 
of a Lie group, 820-826 
of unitary group, 824 
of vector fields, 782 
representation, 859-876 
definition, 845 
semisimple, 844 
simple, 844 

structure constants, 834 
Lie algebra 

of5XCV)，823 
Lie bracket, 782 
Lie derivative, 788 


covariant derivative, 902 
of a 1-form, 790 
of vectors, 789 
Lie derivative 

of p-forms, 793 
Lie group, 815 
compact 

characters, 853 
matrix representation, 852 
representation, 852 
unitary representation, 846 
Weyl operator, 847 
homomorphism, 816 
integration, 832-833 
density function, 833 
local, 817 
representation, 845 
Lie multiplication, 833 
Lie subalgebra, 834 
Lie’s first tbeorem, 829 
lie’s second theorem, 826 
Lie’s third theorem, 826 
light cone, 837 
linear combination, 21 
linear functional, 34,455 
linear independence, 21 
linear operator, 33 

null space of a, 35 
linear PDE, 584 
linear transformation, 32-40 
bounded, 454 
definition, 33 
pullback of a, 39 
Liouville, 512,654 
biography, 514 

Liouville substitution, 513,520 
Liouville’s theorem, 807 
Lipschitz condition, 351 
little algebra, 871 
little group, 871, 873 
local diffeomoiphism, 768 
local group of transformations，817 
local Lie group, 817 
local operator, 452 
logarithmic function, 295 
Lorentz, 799 
Lorentz algebra, 864 




Lorentz force law, 794 
Lorentz group, 657, 837 
Lorentz metric, 892 
Lorentz transformations, 837 
lowering indices, 750 
lowering operator, 335 


Maclaurin series, 254 
manifold, 764 

coordinate functions, 764 
flat, 895 
orientable, 800 
product，767 
pseudo-Riemannian, 887 
symplectic, 801 
map, 4-7 

bijective, 6 
codomain, 5 
differentiable, 768 
domain, 5 
equality of，5 
functions and, 5 
graph of a, 5 
identity, 5 
image of a subset，5 
injective, 6 
inverse of a, 6 
one-to-one, 6 
onto, 6 
range of a, 5 
suijective, 6 
target space, 5 
mathematical induction, 12 
matrix, 82-86 

antisymmetric, 87 
basis transformation, 91 
complex conjugate of, 87 
diagonal, 88 
hermitian, 87 
hermitian conjugate of, 87 
inverse of a, 98 
irreducible, 112 
operations on a, 87 
orthogonal, 87 
rank of a, 86 


orthonormal basis, 89 
row-echelon, 99 
symmetric, 87 
transpose of a, 87 
triangular, 99 
unitary, 87 
matrix algebra, 42 
maximally symmetric spaces，910 
Maxwell’s equations, 796 
Mellin transform，434 
Mendelssohn, 615, 738 
meromoiphic functions, 293-294 
method of images, 616 
sphere, 617 

method of steepest descent, 521 
metric space, 7-10 
convergence, 9 
definition, 8 
minimal ideal, 44, 856 
Minkowski, 920 
Minkowski metric, 892 
Minkowski space, 754 
Mittag-Leffler, 388,463,588 
Mittag-Leffler expansion, 294 
modified Bessel functions, 425 
moment of inertia matrix, 88 
Monge, 95, 198 
Morera’s theorem, 252 
multilinear mapping, 729-735 
tensor-valued, 735 
multiplicative identity, 20 
multivalued functions, 295-302 

w-equivalent functions, 945 
n-sphere, 764 
n-th jet space, 945 
Napoleon, 198, 527 
natural isomorphism, 733,756 
natural pairing, 731 
neighborhood, 459 
Neumann, 177, 702 
biography, 620 
Neumann BC, 591 
Neumann BVP, 591, 619-621 
Neumann function, 424 


reducible, 112,113 


Neumann series, 493, 601， 602 


representation 


Newton, 330, 415, 527, 798, 806, 982 


INDEX 1019 


Newtonian gravity, 916-918 
Noether, 704 

biography, 996 
Noether’s theorem, 995 
non-local potential, 631 
norm 

of a vector, 30 
operator, 454 
product of operators, 456 
normal operator, 117 
normal subgroup, 659 
normal vectors, 26 
nonned linear space, 31 
nonned vector space 

compact subset of, 461 
null divergence, 993 
null Lagrangian, 987 
null space, 35 
null vector, 751, 837 
numerical integration, 74-76 
Simpson’s 1/3 rule, 75 
Simpson’s 3/8 rule, 75 
trapezoidal rule, 75 

Ohm, 614 
Olbers, 422 
one-fonn, 786 
one-parameter group, 784 
open ball, 459 
open subset, 459 
operations on matrices, 87-89 
operator, 33 
adjoint, 61 

existence of, 457 
angular momentum, 331 
aimihilation, 374 
bounded, 454 
closed, 508 
bounded, 508 
compact Hermitian 
spectral theorem, 476 
compact normal 

spectral theorem, 477 
compact resolvent, 50B 
creation, 374 
diagonalizable, 116 
differentiation, 73 


domain of, 507 
equality, 49 
evolution, 58 
expectation value of, 64 
extension of, 508 
finite rank, 464 
formally self-adjoint, 597 
functions of an, 53 
hermitian, 63,508 
eigenvalue, 117 
hermitian conjugate of, 61 
Hilbert-Schmidt,465,511 
integration, 73 
inverse of an, 50 
kernel of an, 35 
local, 452 

negative powers of, 52 
norm of an, 454 
normal，117 

diagonalizable, 119 
eigenspace of, 118 
null space of an, 35 
numerical analysis, 70 
polarization identity, 50 
positive, 65 
projection, 67 
projection 

orthogonal, 68 
pullback of an, 39 
raising and lowering, 335 
regular point, 457 
representation of an, 83 
right-shift, 453 
scalar, 706 
self-adjoint, 508 
spectrum, 457, 457-458 
spectrum of an, 115 
square root of, 127 
Sturm-Liouville, 510 
trace of an, 102 
unbounded, 507 
unitary, 66,127 
eigenvalue, 117 
operator algebra, 49-76 
operator polynomials, 51-53 
operator spectrum, 457 
o(/7, n - p) 9 837 



optical theorem, 308 
orbit, 663, 676, 818 
orbital angular momentum, 999 
ordered pairs, 2 
orientable manifolds, 800 
orientation, 745,746 
positive, 746 
oriented basis, 745 
orthochronous, 837 
orthogonal group, 656, 824 
Lie algebra of, 824 
orthogonal polynomials* 153 
classical, 173 

classification, 176 
differential equation, 174 
generating functions, 189 
recurrence relations，176 
expansion, 186 
least square fit, 155 
orthogonal transformation, 97 
orthogonal vectors, 26 
orthogonality, 26-27 
orthonormal basis, 26 
orthonormal frames, 887 

p-forms，741 
pairing 

natural, 731 

parabolic PDE, 589, 621-626 
parallel transport, 906 
parallelogram law, 31 
parity, 667 

Hennite polynomials, 194 
Legendre polynomials, 194 
Parseval inequality, 149, 851 
Parseval’s relation, 223 
partial DE, 554-592 

characteristic system of, 939 
order of, 584 
principal part, 584 
particle in a box, 531 
particle in a cylindrical can, 539 
particle in a hard sphere, 543 
partition, 4, 669 
past light cone, 838 
Pauli spin matrices, 88, 835, 840 
Peano, 799 


periodic BC, 515 
permutation 
even, 668 
odd, 668 
parity of, 667 
permutation group, 664 
permutation symbol, 93 
permutation tensor, 754 
perturbation theory, 603-610, 697 
degenerate, 609 
first-order, 608 
nondegenerate, 607 
second-order, 608 
Peter-Weyl theorem, 853 
Peter-Weyl theorem 
Fourier series, 853 
photon capture 

cross section, 929 
Planck, 388,463,920 
Poincare, 63,478,497, 620,744,920 
biography, 797 
Poincare algebra, 840, 844 
representation, 868-876 
Poincare group, 657, 817, 839, 872 
Poincare lemma, 797 
converse of, 797 
Poisson, 178,512,528,614,653 
Poisson bracket, 808 
Poisson’s equation, 327,917 
polar decomposition, 129-130 
pole, 273 
polynomials 

orthogonal, 153 
positive operator, 65 
positive orientation, 746 
potential 

non-local, 631 
separable, 631 
power series, 252 

differentiation of, 253 
integration of, 253 
uniform convergence, 253 
p(p, n - p) 9 837 
preimage, 5 

primitive idempotent, 689 
principal value, 285 
product manifold, 767 


INDEX 1021 


projectable symmetry, 944 
projection operator, 67-70, 474, 604 
completeness relation, 68 
orthogonal, 68 
projective group 

density function, 833 
one-dimensional, 820 
projective space, 4 
prolongation, 945-950 
of a function, 946 
of groups, 948 
of vector fields, 949 
propagator，626 
Feynman, 636 
proper orthochronous, 837 
proper subset, 2 
Priifer substitution, 518 
Puiseux, biography, 296 
pullback, 735,786, 791, 801 
linear transformation, 39 
pullback map, 791 
pullback of /7-forms, 741 


quantization 

harmonic oscillator 
algebraic, 375 
analytic, 374 
hydrogen atom，422 
quarks, 701，703, 873 
quaternions, 44 
quotient group, 660 
quotient set, 4 

raising indices, 750 
raising operator, 335 
range of a map, 5 
rank of a matrix, 86 
rational function, 274 
and trig functions 
integrals of, 279 
integration of, 276 
rational numbers 

dense subset of reals, 460 
real coordinate space, 21 
real vector space, 20, 130-137 
realization, 664 
reciprocal lattice vectors, 208 
recurrence relations, 153 


redshift, 930 

reduced matrix elements, 707 
reducible representation, 677 
regular graphs, 713 
regular point, 233 
regular representation, 686 
regular singular point 

second-order linear DE, 405 
relation, 3 

relative acceleration, 913 
relativistic electromagnetism, 792 
representation 

abelian group, 681 
adjoint, 679,704 
carrier space, 674 
character, 683 
compact Lie group, 852 
complex conjugate, 679 
dimension of, 674 
direct sum, 679 
equivalent, 674 
faithful, 674 

general linear group, 856-859 
group, 674 
identity, 674 
irreducible, 677 
Kronecker product, 700 
Lie group, 845 
operators, 83 
reducible, 677 
regular, 686 

symmetric group, 707-723 
tensor product, 700 
antisymmetrized, 701 
character, 700 


l, 701 

trivial, 679 
unitary, 678 

compact lie group, B46 


representation of 
gt(n, E), 860 
sr(n, €), 861 
so(3), 864 
so(3,1), 867 
5U ⑻, 861 
u(«), 861 
residue, 270-273 


definite integrals, 275 
definition, 271 
residue theorem, 271 
resolution of identity, 482, 689,721 
resolvent, 480-485 
compact, 508 
Laurent expansion, 481 
resolvent set^ 457 

openness of, 460 
resonant cavities, 533 
Riccati equation, 968 
Ricci, 891， 898 
Ricci tensor, 917, 919, 920 
Riemann, 30,199, 296, 704, 798, 849, 
891,981 
biography, 888 

Riemann curvature tensor, 889,904 
Riemann normal coordinates, 915 
Riemann surfaces, 296-302 
Riemannian manifold, 887-896 
right coset, 658 
right ideal, 44 
right translation, 820 
right-invariant 1-form, 820 
right-invariant vectors, 820 
right-shift operator, 453 
eigenvalues of, 458 
rigid rotations, 657 
Rodriguez formula, 174 
Rosetta stone, 198 
rotation algebra, 864 
rotation group, 675, 862 
character, 865 
rotation matrix, 865 
Runge, biography, 388 
Runge-Kutta method, 387^391 
Russell, 11, 799 

saddle point approximation, 313 
scalar operator, 706, 707 
scale transformadons, 819 
Schelling, 738 
Schmidt, biography, 28 
Schopenhauer, 738 
Schrodinger, 64, 807 
Schrodinger equation, 675 
Schrodinger equation, 5B, 328, 410,420, 
531,543,631 


classical limit, 383 
one dimensional, 381 
Schur, 711,850, 874 
biography, 681 

Schur’s lemma, 680, 681， 707, 845, 861 
Schwarz, 463,739 
biography, 30 

Schwarz inequality, 29, 149,152 
Schwarz reflection principle, 305-306 
Schwarzschild geodesic, 925-932 
Schwarzschild metric, 892 
Schwarzschild radius, 925 
Schwarzschild, biography, 919 
second order linear DE 

Storm-Liouville systems, 513 
second order PDE, 589-592 
hyperbolic, 589 
second-order PDE 
classification, 589 
elliptic, 589 
parabolic, 589 
ultrahyperbolic, 589 
selection rules, 701 
self-adjoint operator, 63 
self-adjoint SOLDOs, 564-565 
semisimple Lie algebra, B44 
separable kernel, 501 
separable potential, 631 
separated boundary conditions, 510 
separation of variables, 328 
Cartesian, 526-534 
cylindrical, 535-540 
spherical, 540-544 
separation theorems, 361-363 
sequence, 9 
Cauchy, 10 

complete orthonormal, 149 
series 

Fourier-Bessel, 536 
Neumann, 601,602 
sesquilinear inner product, 25 
set, 1—3 

Cantor, 12 
complement of, 2 
countably infinite, 12 
element of, 1 
empty, 2 




INDEX 1023 


intersection, 2 
partition of a, 4 
uncountable, 12 
union, 2 
universal, 2 
sharp map, 747, 802 
shifting operator, 71 
signature of g, 752 
similarity transformation, 91-92 
orthonormal basis, 91 
simple arc, 242 
simple character, 685 
simple Lie algebra, 844 
simultaneous diagonalizability, 123 
singleton, 2 
singular point, 233 
singularity, 233 
isolated, 270 

classification, 273 
rational function, 274 
skew-symmetric, 738 
SL(V) as a Lie group，816 
smooth arc，242 
space 

Banach, 148 
complex coordinate, 21 
dual, 34 

inner product, 25 
metric, 8 

real coordinate, 21 
square-integrable functions, 151 
spacelike vector, 837 
spacetime 

spherically symmetric, 923 
static, 923 
stationary, 923 
span, 22 

special linear group, 656 
special orthogonal group， 657, 824 
Lie algebra of, 824 

special relativity, 752, 837, 868, 872, 986 
special unitary group, 657 
Lie algebra of, 824 
spectral decomposition, 117-125 
orthogonal operator, 135 
real case, 131 
spectral theorem 


compact hermitian, 476 
compact normal, 477 
spectrum 

closure of, 461 

spherical Bessel functions, 428, 542 
expansion of plane wave, 543 
spherical coordinates 

in m dimensions, 593-595 
spherical harmonics, 338-346 
addition theorem, 345, 867 
definition, 341 
expansion in terms of, 344 
expansion of plane wave, 544, 646 
first few, 343 
Spinoza, 738 

square-integrable functions, 151 
stabilizer, 663, 818 
static spacetime, 923 
stationary spacetime, 923 
steepest descent method, 312-315 
step function, 163,288 
stereographic projection 
n-sphere, 769 
two-sphere, 766 
Stirling approximation, 316 
Stone-Weierstrass theorem, 152 
generalized, 196 
stress energy tensor, 920 
structure constants, 826, 900, 905 
structure equations, 888 
Sturm, biography, 512 
Sturm-Liouville 
operator, 510 
problem, 174,622 
system, 510, 637 
completeness, 524 
eigensolutions, 511 
eigenvalues, 513 
large argument, 521 
large eigenvalues, 518 
regular, 510 
singular, 516 
subalgebra, 44 
subgroup, 656-663 
conjugate, 657 
generated by a subset, 657 
normal, 659 




trivial, 656 
submanifold, 767 
open, 767 
subset, 2 

bounded, 459 
proper, 2 
subspace, 22 

invariant, 112 
surjective map, 6 
Sylvester’s theorem, 753 
symmetric algebra, 737 
symmetric bilinear form, 749 
classification, 751 
inner product, 750 
nondegenerate，750 
symmetric group, 655, 664-669 
characters 

graphical construction, 714 
cycle, 666 

representation, 707-723 
analytic construction, 707 
antisymmetric, 679 
graphical construction, 710 
products, 721 
Young operators, 717 
symmetric homomorphism, 655 
symmetric product, 737 
symmetrizer, 736 
symmetry group, 944 

defining equations，957 
of a DE, 941 
of a subset，937 
projectable, 944 
transform of a function, 943 
variational, 988 
symplectic algebra, B36 
symplectic charts, 802 
symplectic form， 746, 801 
rank of, 747 

symplectic geometry, 801-808 
conservation of energy, 805 
symplectic group, 657,749, 835 
symplectic manifold, 801 
symplectic map， 746, 802 
symplectic matrix, 749 
symplectic structure, 801 • 

symplectic transformation，746 


symplectic vector space, 746-749 
canonical basis, 747 
Hamiltonian dynamics, 748 

tangent bundle, 780 
tangent space, 773 
tangent vector, 772 
tangents to a curve 
components, 778 
target space, 5 
Taylor series, 252-262 
construction, 254 
tensor, 731 

components of, 732 
contravariant, 731 
contravariant-symmetric, 736 
covariant, 731 
covariant-symmetric, 736 
Levi-Civita, 745 
skew-symmetric, 738 
symmetric，736 
transformation law, 733 
types of, 731 
tensor algebra, 731 
tensor bundle, 786 
tensor field, 785-790 

crucial property of, 786,787 
tensor product, 730, 731 
of vector spaces, 41 
tensors 

symmetric product, 737 
theta function, 2B8 
timelike vector, 837 
total derivative, 954 
total divergence, 987 
trace, 101-103 

and determinant, 102 
definition, 101 
log of determinant, 103 
relation to determinant, 101 
transformation group, 655 
translation, 819 
translation operator, 138 
transpose of a matrix, 87 
transposition, 667 
traveling waves, 533 
triangle inequality, 31 


INDEX 1025 


trivial homomorphism, 655 
trivial representation, 679 
trivial subgroup, 656 
twin paradox 

as a variational problem, 986 

uncertainty principle, 78 
uncountable set, 12 
union, 2 

unitary group, 656 

Lie algebra of, 824 
unitary operator, 63-67 
unitary representation, 678 
universal set, 2 

Vandermonde, biography, 95 
variation of constants, 360 
variational derivative, 977 
variational problem, 9B0 
twin paradox, 986 
variational symmetry group, 988 
vector field, 780 
flow of a, 784 
integral curve of, 783 
left-invariant, B20 
Lie algebra of, 782 
vector space, 19-45 
automorphism, 36 
basis 

components in a, 22 
basis of a, 22 
complete, 146 
complex, 20 
definition, 19 
dual, 34 

endomorphism of a» 33 
finite-dimension 
criterion for, 462 
finite-dimensional, 22 
isomorphism, 36 
linear operator on a, 33 
normed, 31 
operator on a, 33 
oriented, 746 
real, 20 

symplectic, 746 
vectors, 19 

norm of, 30 


normal, 26 
orthogonal, 26 
Volterra equations, 488 
Volterra, biography, 490 
volume element, 746 
von Dyck, 390 
von Humboldt, 177, 614, 738 
von Neumann, 874 
biography, 478 

wave equation, 328, 532 

symmetry group, 961-963 
wave guide，533 

cylindrical, 537 
rectangular, 534 
wedge product, 740 
Weierstrass, 11, 30, 296, 388, 588, 739, 
841 

biography, 462 
Weyl, 744, 842, 942, 996 
biography, 849 
Weyl basis, 835, 844 
Weyl operator, 847 
Wigner, 168,943 
biography, 874 
Wigner formula, 865 
Wigner-Eckart theorem, 707 
Wigner-Seitz cell, 208 
WKB method, 380-383 

connection formulas, 3B1 
Wordsworth, 806 
Wronski, biography, 357 
Wronskian, 355-365,511 

comparison theorem, 362 
separation theorem, 361 

Young, 850 
Young frame, 712, 718 

negative application, 715 
positive application, 715 
regular application, 714 
Young operator, 856 
Young pattern, 712 
Young tableaux, 713,718, 857 
Yukawa potential, 214 




